Moore might have been able to design and build Intel 14900k with his bare hands and a piece of paper, but he was not able to learn the latest piece of shit frontend framework in 3 days after it was released in 4chan
@@billywhite1403 Yep. The main problem with Moore's law is clock speeds, power consumption, and die/process manufacturing size (excluding design problems like clock timing amongst other things). All those 3 things combined will slowly kill transistor count.
@@mirakekkle9476 I'm not saying you're wrong, but I do think they were saying the same sort of thing thing in like 1987 (Just to pick a random year before things got really digitized, aka before a paradigm shift in chip design, aka we could be standing on the threshold and not know it. I'm sure there is a limit on physical transistors. But there are so many other ways to indicate 0 and 1, especially on the particle scale, I can imagine we figure that out within the next few decades
What are you on? Benchmarks should never be in percentages. Do you see CPU and GPU benchmarks in percentages? No because we don't know the upper limit so we can't logically create percentages. Same with LLMs and the like. For whatever reason people are using percentages, but that doesn't mean it's correct. For example, we have an IQ system with a max of 200. But we decided that with some parameters in mind. But there could be conscious entities, non-humans, which could have an IQ of 1 billion. They could do all the computation of humanity till date in 1 sec. We just don't know.
@@SahilP2648 True, but those percentages are according to benchmarks. Also benchmarks should be adding more complex tests while the LLMs are improving.
Yup. Gemma 2 2b (2.6b actually) is really good at translation, but when it struggles, I go straight to Mistral Large 2, which is more that 47x its size.
8:50 that is absolutely the wrong way to look at those graphics. that isn't a chart with uncapped ceiling, it tops at 100. so if we invert the data to show "mistakes" (just a way of saying) instead of the score, you'll notice that jumping from gpt-4 turbo1 to gpt-4o, which in the graph is roughly 74 to 80 (or a "8% improvement), is actually a 23% reduction in mistakes if you notice that it went from failing 26% of the time to just 20. and reliability in a system is not something that is valued linearly. a system with only 50% accuracy is as good as useless, while something approaching 90% starts to get actually useful. According to that graph, gpt-3.5 is probably around 57 score. compared to gpt-4's score of roughly 67, that was a reduction of only 15%. so gpt4t to gpt4o was a bigger jump in reliability.
not to mention the massive speed improvements and cost reduction for using these models within a year's time. Not sure how much of that can be placed on other factors (infrastructure, funding..) but Microsoft has been backing openAI for a while now so I'll put it down on the models improving (though it could be the increasing competition..). I think it is the models that have been made significantly faster and cheaper than before.
yeah this dude started with a conclusion and tried to find arguments to support it. I'm definitely "worried" about a slowdown, but this video actually made me slightly less worried lol
This claim of 'smaller and smaller wins' also assumes that these benchmarks scale linearly, which likely isn't true. It's possible that while 70% of the questions are easy to answer, each subsequent 5% is significantly harder, than the last and require more training. So a 'small' n% gain in performance GPT-4o gains over GPT-4 may actually measure a significant jump over a subset of very difficult questions.
As we reach higher benchmark scores, you have to flip your understanding of them. If accuracy goes from 80% to 90%, that feels like a 10% improvement, but in reality the error rate has gone down by half, which is basically a 100% improvement.
You also forgot to mention how extremly flawed, early-stage and biased the ai tests are that are supposed to measure their performance. It is way too abstract to just slap a number on it. You dont say "this guy has 76 intelligence" or "this guy is 13.2% smarter than the other guy". It's artifical intelligence, intelligence, the thing we have. So as hard as it is to accurately measure our own true intelligence as difficult it will be to measure ai intelligence, but that doesn't it plateaued and it doesnt mean the improvements wont be noticeable. Its honestly laughable to think that the top-notch scientific masterminds of this generation and the most valueable companies in the world are all wrong and wasting money, because a guy on youtube read a graph and said "see? It's plateauing!"
I totally understand that a lot of people don't like "AI" (what they actually mean is generative AI), because it became a symbol for everything that is wrong with big tech, just as Glyphosate became a symbol for everything that is wrong with industrial farming. Yet I'm baffled how even tech-savvy people only focus on the results not being completely perfect, while ignoring how much of an achievement it already is to get almost-perfect results. We have computers that easily pass the Turing test.
@@BigSources You might want to look into survivorship bias. Plenty of big companies have invested mind-numbing amounts of money into things that never panned out. It's the nature of the game.
corrections: 3:00 compares SINGLE thread with an ever growing number of gpu cores lol. i find this graphic highly misleading. peak operations per second continue to follow moores law even on CPUs. people said moores law is plateau-ing for decades now yet it hardly slows down. the reason why you dont notice a difference between the 10700 and the 14900 is because both are "good enough" for normal every day tasks. however in productivity workloads like cinebench 2024 the 10700k scores around 740 points while the 14900k scores 2180. thats almost 3 times as much. 6:04 no, you can run, look at and modify any llama model code. its on their public github repo. you still havent learned that after your bad rant about meta. mistral on the other hand only provides the executable, not the code. difference between open weight (mistral) and open source (meta). 7:30 apple certainly didnt invent hardware accelerators or the the big-little architecture. also neither of that wouldnt show up on the funky graph you showed anyway because it only includes single core performance. 8:10 that is probably one of the worst charts you could have picked to show the slowing down of progress though i do somewhat agree with you.
I'm actually getting pretty sick of how under-researched his videos are. I noticed this with the Llama 3.1 video where he absolutely butchered what the model size means.
The graph at 3:00 is classic NVidia shenanigans. These guys are totally incapable to ever produce a graph that's both correct and not misleading. And useful. Other than that, yeah, good corrections. There's even more minor corrections, if someone would be pedantic. I think that Theo deep down knows he's not accurate / properly correct, but overall he gets the big idea pretty well. Edit: one thing that I can give Theo is that he does listen to the suggestions and corrections, well, as much as he can take in a limited amount of time. You can see in this video that when he showed the fast inverse square root he didn't say that John Carmack invented it. That's an improvement on his part.
small factual correction: "One of the crazy things apple invented, was the idea of having different cores with different roles"... No they didn't, it was actually ARM. Hetereogeneous Computing strategies and Big/Little architectures were not invented by apple :)
It feels less like LLMs have plateaued and more like the benchmarks are all being gamed and optimized for. Claude 3.5 sonnet, for example, is a cut above all other models.
it's not so drastically better that it doesn't belong in the same conversation. The amount of "I apologize, you're absolutely right" I get from Claude doing the typical LLM mistakes almost reads like a parody sometimes.
@@PraiseYeezus I think what shows this more is what happened with chatgpt. GPT-4o is drastically dumber than GPT-4, yet it's somehow considered a giant upgrade according to the published metrics.
I don't understand what you mean by benchmarks being "gamed" and optimized for? For what? AI/ML will plateau if it hasn't already. This is unavoidable. Maybe there will be small improvements in the same sense hardware plateaued a long time ago and there are only tiny improvements. I realize manufacturers advertise their new hardware as some massive upgrade, it's not. Maybe they'll do the same for AI models. Case in point, I still run a 2012 pc. Long gone are the days where a 286 was more like 1000 (that is one thousand) times faster than a 8086 and the difference was unreal compared to an not so older say 6510 or say a Z80. Now you might get point clock increments and maybe some more cores to compensate. The same thing will happen with AI programs, those who envision singularities are fools. Whether those singularities are AI or black holes, they all be fools imnsho. i.imgflip.com/3pvz1p.jpg
I disagree with the interoperation of graph at 8:39 , Its a benchmark score out of 100. It will always asymptote and isn't analogous to a TOPS or Transistor count graphs. To see a real asymptote we would want a harder benchmark where we start at lik 0-20% performance, go up to like 50-60% with newer models but stop improving there and well away from a human expert score on the same benchmark
And also time of the release on x axis dont measure how much data were those models trained on nor how much computing power they needed. If there was any plateau it wouldn't be on potential growth of GenAI with bigger models
@@LucyAGI we expect all the things tech bros promised when they were hyping chat gpt when it first appeared. Now that it does not deliver , you all cope and become wordcels to justify the failure .
@@LucyAGI Goal is, make an AI smart and general enough to where it can perform the tasks of a Machine learning expert and work on ai itself, if you can inference a lot of them, then you have 1000 experts (or more) making ai smarter, more efficient so on, thats where the exponential would kick in (theoretically, we dont know where the ceiling on intelligence is so the "exponential" could be short-lived) When we'll get there is questionable. *maybe* we'll get there with LLM based systems, i believe we'll see llms being used as a weak reasoning engines in a system of tools, verifiers and agents within the next year, possible this falls through though and that llms scaling doesnt make them any better meaning we hit a hard stop ceiling and need to find another architecture alltogether but imo, thats unlikely as of right now / too early to say (as we havent made the aforementioned tools yet)
Yeah came here to post this, agree completely that all benchmarks would look like the scores are flattening as they reach the top. Also with benchmarks like MMLU they've been through carefully and found a bunch of the questions have issues with them such as no right answer, multiple right answers, missing information in the question etc, which means that no system will get above a certain level.
@@jmd448 Probably better benchmarks will come out as both performance and expectations evolve. The real issue here is that there are no cold hard metrics to compare, even comparing numbers of parameters isn't exactly equivalent to number of transistors in a chip...
Just to note, LLMs are the current focus and are unparalleled for natural language processing, but even if LLMs do plateau, I really do think there is further research + neural net architectures that will give us another boost forward in AI progress, over time I can def see multiple 'model architectures' working in tandem to complete complex work. So basically, I think even if hardware compute advancements are slowing, progress and research into the fundamental technology is accelerating, and I hope we will discover breakthroughs which allow us to derive more intelligence from less raw compute. Yes, etched and others are working on altering the hardware architecture to fit neural nets, but there is much to be said to iterating neural net architectures to utilize raw compute orders of magnitude more efficiently.
well they kind of have already .... Llama 3 8B is really very good considering the amount of compute required. GPT 4o-mini is exceptional considering it is likely a tiny fraction of the size of its big brother. But the entire "scale to improve" thing seems like a hiding to nothing and incredibly inefficient. We need another architecture level eureka.
This happened with re-enforcement learning too. The models had so many nodes that back propagation had virtually no effect, meaning they became too big to train any more and even got worse with more training.
Introduction: AI Plateau Hypothesis (00:00:00) Moore's Law Explained (00:00:29) Decline of Moore's Law (00:01:59) GPU vs CPU Performance Trends (00:04:28) Analog AI Chips and Specialized Compute (00:05:24) Rise of Open-Source AI Models (00:06:18) Potential Plateau in AI Model Improvements (00:07:20) The Bitter Lesson: Focus on General Methods (00:09:53) Importance of Algorithmic Innovations (00:12:33) Future of AI: New Architectures and Methods (00:14:33) Hype Cycle for Artificial Intelligence (00:15:30) The ARC Prize and AGI Benchmarking (00:18:56) Conclusion: Future AI Beyond LLMs (00:21:42)
You really misread that graph around 17:00... it's not a progression of a single technology across the hype cycle with every point being a further refinement that brings it further along the cycle, but rather each point is an independent technology that currently exists at some point of its own hype cycle.
It is. Though it is from 2011, and Apple was using ARM for the iPhones since 2009, right ? They could've had some input on that, though the wikipedia article about big.LITTLE doesn't say anything about Apple in the initial development.
Apple's M series are completely custom designs that use the ARM ISA. But they're significantly different from, say, a Cortex, and contain significant extensions like an option to use a stricter x86 style memory model to make Rosetta2 work at near native x86 speed.
Some rebuttals here. First, I am an Engineer at Hugging Face, working on generative modeling, which, to me, is not JUST language models. It also include {image,video,speech}-generation models, too. I am mentioning it at the beginning of my comment to let others know that I am not a podcaster nor an influencer who doesn't know his shit. 1> Too bad that you only cared about compute and that too only from a single provider i.e., NVIDIA. TPUs have been there for a long time and has a moderate share of the market. They don't have the problems that NVIDIA GPUs have such as shared memory, availability, etc. It's a completely different stack and when done correctly, can be faster than GPUs. TPUs are, of course, primarily used by Google to train and serve their flagship models. But other companies such as Apple, Midjourney, etc. have been using them as well. 2> You only showcased public benchmarks but in reality, a business that is remotely serious about integrating "AI stuff" will have internal and private benchmarks which will keep on evolving. Also FWIW, in the domain of {image,video}-generation none of the models exhibit the performance ceiling yet. Let's not discard those models as they are quite business-friendly. 3> Model architectures and modalities exhibit different properties and hence, their dynamics are different. For example, Transformers lack inductive priors as CNNs, so, they usually need more data to counter that. Then, you have LLMs which tend to be memory-bound, whereas diffusion models tend to be both compute and memory bound. This leads to different engineering practices. 4> 3. brings me to this point. We have seen technological advancements across all facets -- data, architecture, pre-training, optimization, etc. Not all these axes have been fully exhausted. As long as there is a good dynamic in any one of them, things will keep improving is what I believe.
@@dirremoire A RUclipsr with a huge audience, which leads to influence and power over his audience. Every content creator should be aware that the mistakes they make in their videos are going to be parroted from then on until someone with a similar audience size can disprove them - but by then, the damage was already done.
I’m just a regular person, not a tech or engineering guy-I had to look up to see what TPUs were, for example-and your comment was very helpful and informative. Thanks!
Clearly the things that you cite are why Kurzweil doesn't use Moore's Law as the basis for his projections. He uses "computations per second per dollar", which is tech agnostic. I work with LLM APIs regularly and am convinced that they are great with Kahneman's System 1 thinking, and once System 2 is brought online, the floodgates will open for widespread use.
He is right on a few things, that is, Theo. We consistently over reach our expectations whenever technology moves forward and pretend that this time is different. Your response here is clearly another example of that as you have not really rebutted anything so much as chosen cavoites. It does make sense from a point of view for sure, that is; you are in the thick of things working on it every day, your likelihood to be overly optimistic about what you work on is high, meanwhile a guy like Theo is dealing with a double whammy of obsolescence, not only his ability to be an influencer is under threat by content creation, his primary skill as a developer is also under threat. My point here is that you both have an extreme bias towards the situation which is part and parcel why you responded at all. For Number 1; Not saying that TPU's aren't a better platform but this is probably more akin to comparing ARM To x86 architectures. ARM has been "better for a decade". But why isn't it fully adopted still? Exactly. You want to just ignore that entirely. I feel like number 2 is really just changing the discussion. 3 and 4, I guess time will tell. I personally have noticed that the expectations have gone up on me dramatically as a programmer, and as I use these tools, I often find that they still are unable to do basic anything and it requires so much work to create a final solution that I just would have been best to type it out myself and mentally break down the problem myself. AI is good as a text processor to help me manage and maintain lists and do mass text manipulation, its great at helping inspire me towards choosing a direction, but its ultimately terrible at doing anything really intelligent. For sure we are slowing down, I think even you are agreeing with that, but there is no reason we are going to slow down forever, it may speed up again; trying to predict this is a bit insane.
Even amidst authoritative data sources used during training, not all factually accurate data is equally true or helpful given context. And context is something that LLMs struggle with and sometimes hyper fixate upon. For example: “The sky is blue.” True, but… is it as true as “The sky is sky blue”? What about “the cloudless sky at mid-day has the optical wavelength: 470nm”?
That’s not actually a problem. That was a silly little concept that Yann LeCun postulated and which has been proven completely wrong; this does not cause problems for LLMs at all.
@@jimmyn8574 Generally if the solution is simple and obvious - you don't understand the problem. I have found this to be true of almost everything in code and life in general. Nothing is every easy. Learn. Try hard. Be kind. Best of luck!
The message of the video is correct, but I can tell you have no idea of what you're talking about, you're just making wrong connections and interpretations of what's in front of you.
The problem with Moore's Law is it became Moore's Yearly milestone. I'd argue once they realized the trend they would withhold performance knowing putting too much into a gen will guarantee a very small bump in next gen because we are reaching physics barriers where transistors are getting too close to isolate their signals / have them behave properly and hold up over time.
Intel did that with their 14nm . That's now why they have competition from TSMC and Samsung. They're going to eat that and have to sell all their factories. Their production methods are still behind others. They thought they had a monopoly and were able to keep the next steps and amortize it for the next Gen . Didn't work, their competitors went ahead to the next nodes. But even them are now struggling with their fake 3nm. There's no real 3nm. It's over. CMOS is done, that mine won't put out more diamonds. And any other method ironically wasn't researched because it would be a step down to performance levels of 1990 until 5 generations later, finally it out paces CMOS, that would be 2030. They had to have stopped putting everything on CMOS tech in 2015. It's too late to keep the Moore's law now. Its dead.
@@monad_tcp Yep I can't say I'm surprised what's happening to Intel is happening. You want to be the master of the clock, but it's becoming a honed samauri that's recently fading vs a machine gun. Intel just kept pushing smaller gains for bigger costs. I moved desktops where I could to AMD. The power efficiency savings especially heavy load over time made it easy. I started using macbooks once apple had the mac silica ARM. Late Intel chips were toasters in that aluminum body == recipe for early death. It was also stupid for mobile-use. I it to last long and not worry about "oh no turn off bluetooth, don't play spotify, dont have youtube going". I borrowed a friend's X1 carbon and it lasted like 2.5-3 hrs of real use with its Intel i9 vs easy 8 hrs from the mac. I'll want to see a few good gens in a row to consider Intel again unless really really really is best use-case.
We would get photonics-based processors and maybe IGZO as the material change. Shunpei Yoshida the original creator of the LCD said IGZO is a really good candidate, offering upto 100x speeds with the same manufacturing and architecture.
The AI model release chart doesn't look like it's plateaued yet as Theo stated. It still looks like it's still linearly progressing in terms of benchmark scores relative to time for each of the three models.
A language model can read all the websites and books it wants to about how to ride a bike, but unless it actually has to control a bike and perform specific tasks it will suck at them, it will never be an expert on riding a bike. This is the fundamental flaw with current LLM models, they're fancy search engines (they have a use - but also a limit)
@@entropiceffect No if an LLM is big enough according to transformer scaling laws, its outputs will be undistinguishable from its inputs, so yeah a 405B model wont be able to but a 405T model will be able to. For example, if an LLM did have to ride a bike you'd give it a picture of the bike your using and than use an LLM to predict body movements, right? Well, since it has learned so much about bikes in its training data and because it is so large and can generalize so well it could do the task with ease. We never though an LLM could reason or simulate the world but when we scaled it up it could do it. This video literally stated that transformers wouldn't get better anymore even though all the newest papers show completely linear improvements eg llama 3 paper, this video was just complete misinformation
@@exponentialXPIt’s not the same forever; scaling all plateau eventually. But, we’re not there yet and he’s wrong to say we’ve plateaued. He made a number of mistakes in this video.
@@entropiceffectNo, that’s simply not true. You’re talking about tacit knowledge as opposed to explicit knowledge, but you don’t need tacit knowledge to be an expert at something; you only need tacit knowledge to actually _do_ that thing. Since we don’t need LLMs to actually ride around on bikes, they don’t need a that tacit knowledge. But LLMs can easily be experts on how to ride bikes (and many already are).
8:35 i feel like this is missing that each of the models on the GPT4 line are getting smaller and cheeper; they are not intended to be smarter they just happen to be, the intent is to get the thing to run on a phone.
regarding improvements over time - as Nate B Jones pointed out on his channel recently, we're measuring progress incorrectly for where we're at on that curve. while going from 90% to 95% in a benchmark looks like a small improvement, this is actually a whopping 2x improvement *in error rate* which is what we're trying to reduce now. when you're measuring progress towards a limited goal of 100%, progress is going to look slow towards the end, but that probably just means we need new benchmarks and different ways to evaluate the results. with regards to the ARC-AGI test: I don't understand how or why this is a test of language models - it's not a language problem, as far as I can tell? and I'd expect, if there's an AI (not an LLM) that can teach itself to play Go, the same AI would be able to figure out the simple rules of this "game"? so this just looks like a simple case of "wrong tool for the job"? I wouldn't expect a language model to be good at this, why would it be?
@@SahilP2648 there's no real agreement on the exact definition of AGI, as far as I'm aware. but I don't think it's merely about perfecting current benchmarks.
The most 'general thinking' AI systems we have are LLMs I think the idea of the ARC-AGI test is to try and get the LLMs to do logic and maybe math, because that's the part the LLMs are bad at. Or maybe they want to have a good way for LLMs to outsource it to a module which can do it correctly. Any solution would do.
@@RasmusSchultz when a model can derive new maths and physics equations like 20th century scientists, we have achieved AGI. That's a hallmark test. Doing this currently requires human intelligence and machines can't help, only experimentally. Theoretical physics is bound by human intelligence.
@@SahilP2648 this just doesn't sound like something a language model can do, at least not in their current form, being essentially just next word prediction. as you said, maybe through tool use, but what kind of tools? we already let them run Python code. I don't know. we'll see I guess. I'm a lot more skeptical than I was 3 months ago, because I don't feel like we're seeing the same breakthroughs, or even promises of breakthroughs. but we'll see 🙂
The reason performance is plateuing is because the scores are getting close to the maximum value of 100%. You obviously can't increase by 30%+ once you get to 90%. This is the S-curve phenomenon.
I think the whole point of singularity isn't that each specific component will get infinitely better, it's that that component will help us discover new components in which those components will help us discover new ones. And the rate of speed between accelerating quicker and quicker. Just as agriculture was a component that lead to the industrial revolution Then the industrial revolution to electricity Electricity to the computer age Computer age to internet Internet to AI AI to ... And all of these things had sub branches. Computers helping with simulations. Electricity to new forms of motors etc...
you have no idea what you are talking about, turbo is a smaller faster version of gpt 4, of course it wouldnt be better, it was a leap for it to have clkose to the same performance, and 4o was an even smaller model that is multimodel, so it can input and output text, image and audio all at once, which has never been done before at that performance level
Bro there are tons of experts saying that we are plateauing, and I have been saying Ai was going to plateau. like, literally, if you understood, how computers work. it would be really hard for you. to try to rationalize. how a. I wasn't going to plateau.
@@EDM-Jigger there is a relationship between loss and compute, basically we are bottlenecked by compute, bear in mind current AI models are less than 0.1% of the size of one human brain, and our architectures arent as efficient as the brain. We are still very early on in AI and it wont plateau because we will keep finding ways to improve models
@@incription Yes, and or no, I'm going to get any stronger underneath a classical compute logic. How would it if it's 0.1 percent the size of the human in. a 4.7 gigahertz. processing core is only capable. of completing 4.7 billion calculations a second. and we can't cramt transistor zone any smaller. without having an electrical current jump to the next transistor creating an ecc error. in the L-3 cache of your CPU. How on Earth do you think we're going to get that 4 billion into the trillions? underneath this classical compute logic. of true and false which stands for one. or zero.
Thing AI advocates always miss, is that the growth of these models is rapidly constrained by compute and energy resources, those exponential curves assume society decides that AI is a good place to direct all those resources
Hubris. I tried to say to some of these enthusiasts how the electricity consumption by these models is not very viable. The only response I got, 'hur dur, it will get more efficient'. As someone from a third-world, I understand there far more important things electricity is needed for than text and image generations.
@@thegrumpydeveloperit always make me angry. Why don't we have an ATX standard for GPUs to decouple everything , why do we need to buy RAM from Nvidia at extortion prices for the same shit everyone uses in their main memory. VRAM is RAM, the only difference is that your CPU bus is 512bits (4 channels of 64bits) and GPUs are 4096bits and they have a couple of extra lines for synchronization as their access is less random. But it's roughly the same shit, witn small changes in the memory controller.
@@3_smh_3 they don't care it cost $1/h to use GPUs and when the fake marketing idea of AI is to replace humans with minimum wage of $15/h. Capitalists salivate at that thought. No one is looking at countries that earn less than that, or where energy is expensive. Also third world countries are corrupt, their energy cost is usually way, way less than rich countries, it's all taxes on energy, that's why they're poor. The energy cost is actually not higher because there's little demand. Which is ironical as the data labeling used to make AI is probably $0.2/h for human actually doing the work. AI is basically just outsourcing, but in time instead of space.
Turbo and Omni models aren't about scaling up. They already said after GPT-4 was trained that their plan was to first make them efficient and multimodal before scaling up again, since that was far more important and sustainable. I'll wait for actual data on the next model up before I make assertions that require it.
Except they're getting smaller as time goes on now. Smaller yet more efficient. If rumors end up being true, gpt-4o mini is something like 8 billion parameters.
@@timsell8751it’s both. The largest models are getting larger and the smaller models are catching up quick to the previous generation of the larger models
@@timsell8751 that isn't an argument against, if the newer traning techniques are applied to the larger models we'll see a jump. They haven't been applied coz it's probably expensive to retrain the larger ones. If anything shows that the more efficient techniques will create an even bigger step up for gptv5
“we are in Moore’s Law squared” - Jensen ai is beginning to synthesize data and creating its own ai models that will create its own synthesized data and ai models based on what taught it, etc. in a positive loop 🔁. (This is September 22, 2024 today). Eventually this will grow at such an automated pace that we will be living with AGI then ASI. it’s kinda hard to imagine what things will be like even by September 2025.
Apple M1 single thread geekbench 6 score = ~2400 Apple M4 single thread geekbench 6 score = ~3700 So 54% improvement in 3 years. ~15% per year. MultiThreaded performance can increase as much as you want to pay for extra cores.
I love that you take the hype cycle chart as fact, based on what data did people determine that AI's hype and decline looks like that. What data points were used to construct the chart at 15:46? Looks like its made up.
@@Tom-rt2 Even at the current plateau, AI renders vast business practices obsolete. We haven't even begun adapting it. I just went to the DMV to renew my license and was given paper form to fill out.
@@Tom-rt2 There's no current plateau in my opinion. We're FAR from one. Companies are releasing smaller and smaller improvements to their models instead of waiting many months to release big ones. We didn't get GPT 1, 2, 3, 4 one after another, there was significant time between their releases, and this time Sam himself said they'll instead release smaller, more frequent models, hence they're called 4turbo, 4o, etc. If the strawberry leaks are to be believed, GPT5 that they have internally is insanely powerful, it has a much bigger jump than 3 to 4. Except they can't figure out safety, so they won't release it any time soon. If for any reason they're going to slow down, isn't because of a plateau in performance, but a plateau in safety.
I refurbished my PC around 6 years ago with a 1080 NVIDIA graphics card. I have yet to need to upgrade it for any reason. Not remotely surprised how quickly AI is plateauing given how much money is being poured into its development.
i get your point, and you are correct. LLM's might hit a plato however, this does not mean it will stop the development of ai products. The product innovation and foundation models do not move at the same speed.
The graph you are showing is for a single-threaded cpu, but we actually started to have multiple cpus working in parallel so the moores law kinda still works
So many bad takes and half-knowledge. Every fucking Technology that uses a computer chip, evolves over many models in a S-curve, and every fucking time we get to the and of one S-curve we have already a new tech that performs worse but than exceeds the old s-curve.. this happen over the last 40 years over thousand times, and this will not stop. this has nothing to do with economics, nothing with bad software or even with society, technology is at a point where we see every 4-6 month a new "thing" that is in its low 10% performance but beats the old "thing" that is already in its 90% performance. Every fucking time. Only Wars,Diseases and market crashes stop this for a fev months.
Aha. Until the next leap forwards. I don’t get how people don’t get that innovation comes in leaps and then increments that optimize the leap, then the next leap etc. With some leaps actually being dead ends long-term. We are so used to the fast moving increments we miss that leaps are happening all the time but take time to become integrated and then incremented upon.
The platou in performance may have something to do with the benchmarks used in the graph, models are achieving 95%+ on MMLU and MATH so there isn't much room to see improvement. Hugging face revamped a bunch of the benchmarks for that reason
Moore's Law is not about it being 2 times "faster" or 2 times more dense, just that we can make working chips with double the amount of transistors (they can still be bigger, more cores and so on, the biggest chips now are usually GPU's). It still works that way although it's moore (pun) like every 2.5 years or so now, and instead of just making things smaller we do more layers and have better heat transfer. And the LLM's were always at a plateau, it's just that we can put in words what the limitations are now. All improvements now is just adding abstractions on top that make better prompts for you, a kind of transpilation and db layer that's not directly related to LLM's at all.
Since you asked, A lot of it is the co-piloting that makes it so remarkable. Sure LLM's on their own won't replace 100% of jobs. However what we are learning is that "good enough" showed up last year and we are learning how to make better use of what we've got. As you mentioned the hardware will be a big part of it, but like you said we have diminishing returns there. What we are quite likely to see is an app or likely brand new kind of software that shifts gears through different kinds of AI. When the first steam engines were pumping water out of coal mines, they had this perspective. We need to realize that we don't need more efficient engines initially, what we need to learn how to adapt what we've got in new ways. *That* is the way we'll find under-the-hood AGI.
But the LLM paradigm is what is considered the future of AI. Almost everything can be viewed the way LLMs do (states, next token prediction) and so all these other AI models would be obsolete if the hype about LLMs and transformers are real.
Was there any graph that showed how GenAI improved over used computing power to train models? I feel like this is the only thing we should look at if we were to believe GenAI can be improved without guiding it in one direction. There were benchmarks scores over release date but that shows like noting
I agree that raw LLM intelligence is slowing. Claude Sonnet 3.5 's big leap for me is the artifact and project knowledge base features they built into the product... not the actual llm itself. The LLM being trained to work smarter with tools is the next incremental improvement. For example, can I fine tune llama 3.1 to work with a custom vscode extension that can search, replace and delete inside my codebase? That would smart tool use. Not AGI. Anyone else thinking like this?
I trained one to predict commands in my command line based on my history. I would never trust any cloud service with that data, I had to do it myself on my own hardware.
Claude 3.5 the model itself is also a big improvement, why? Cuz the small improvements it made over lets say gtp4, ppl were saying it would be very hard to and would take years do but they managed. But yeah artifacts is insane, read somewhere its a system prompt so that's even more insane.
no llms scale log(compute) linearly infinitely with no plateau, the reason it is slowing down is because there is less compute difference between model v2 and v3 eg
I think apart from microchip architecture and other technical issues the main wall AI is going to face will be energy consumption. At a certain point it will drain too much energy from other fields in society making people question if it is worth it.
Why do people keep saying AI isn't going to keep improving when everyone who stays current in the field sees the opposite? The last 8 months have been insane and it's getting even more insane over time.
A lot of people don't want AI to do good, because of X reasons. This means they are biased. Isn't console gaming still kinda popular? Why are there so many people saying it's better than PC gaming? Those two completely different topics have the same answer: people want to feel like they made the right choice. Like if they took the right career, choose the right gaming platform, got the right job, etc. Btw, if you are programmer, and you are reading this thinking that I'm talking about your job, because that's the popular thing to say: no, AI will not replace your job. I know it. You know it. Everyone knows it. Again, people LOVE to say that because they are biased. They love to believe that they were "saved" by dropping out of college, or that they avoided a huge waste of time by not studying, wanting their time partying or in videogames. Again, when people want to believe something, they will. Let them talk. Reality hits hard, and not only programming is among the safest career now that AI is a thing, but also, chances are they are doing one of those jobs that can easily get replaced by AI, actually do their job, and they aren't even paying attention to it.
GPT-4o is obviously much better than GPT-4 turbo. And Claude sonnet 3.5 is better than GPT-4o. And the smaller models like Phi3, Gemma2 and Llama 3.1 are rivaling the much,much bigger legacy ones like GPT-4. As for the benchmarks you can't just look at the increment in absolute value. You have to consider that it gets harder and harder to increase: going from 96->97 will probably require the same amount of "improvement" as going from 70->85. And the benchmarks are also gameable. There are plenty of fine-tunes with significantly higher values than the original models without them seemingly being better in general.
The actual next step here will be similar to what happened in early 2015 with vision models when we realized most weights were useless and wasting compute, we'll optimize. It's a very natural cycle in AI research that has been repeating forever now. However, one thing that is super weird with the current scale of models we have is the emerging ability that some of the models seems to be exhibiting. Like very very large models aren't behaving the same way as their "smaller" counterpart (who still have billions of parameters). Some of the largest models out there behave like they are able to be meta-learner, even without changing their internal weights.[1] What I think will happens now is we'll converge into a semi-universal way of encoding the data, which will force a change and optimization of architectures (which already is kinda happening across the different provider). If you look at the massive success that Meta had with Segment Anything family of model it's pretty clear that's the flow. [2] These two force combined will give us a first truly multi-modal AI that will be less wasteful than current mega models, while keeping the interesting emerging abilities. That + more optimized compute will give way to the next wave of AI advancement for sure. [1] research.google/blog/larger-language-models-do-in-context-learning-differently/ [2] ai.meta.com/sam2/
@@freeottis It’s not an ad hominem. If anything it’s an appeal to authority, but OP wasn’t claiming that the guy is wrong simply because he doesn’t have experience. He was commenting on the extreme confidence this guy makes his claims with, when even experts in the field of AI with 20+ years of experience tend not to speak in such certainties when making predictions, because they’re educated enough to know that it’s a very difficult field to predict. Especially when it’s moving as rapidly as it is now.
I didn't understand 6:14 Llama was released with the model, and the weights, so I'm not sure how "it's not technically open source". Perhaps I misunderstood though.
We still need Nvidia hardware for training thanks to CUDA, but that might change because of the ZLUDA project, allowing AMD GPUs to use CUDA code. But I only have surface level knowledge about all this, so I am not sure. And Groq's chips are specially designed to have very fast access to memory with faster bus or something, allowing faster processing.
Safe to say, investing in Nvidia right now will turn out to be profitable for some time to come. That is, if you believe that AI is just getting started, as I do. Everything I'm reading points to NVIDIA retaining their lead for 2 - 3 years at least. Fabs ain't easy to make. I did not know that about AMD and Cuda though, gonna have to look that one up. That would be huge if true - and it would be great for everyone going forward, as I'm worried about this insane monopoly that NVIDIA could gain/already has. But even then, AMD is 10-15% of market share compared to 75%-85% with Nvidia. Gonna take quite a lot to significantly impact those numbers.
*let me summarize...* 1) language models that "train" using scraped data will massively slow down (overhyped and lack of new data to learn from), whereas 2) Machine Learning that involves setting up a problem to solve and having the computer learn to solve it will accelerate (i.e. NVidia DLSS upscaling... make low res image look like high-res image... and eventually make UNHEALTHY cell look like healthy cell. i.e. devise a treatment)
You are looking at a one-year window in AI abilities, and even if we disregard the fact that moving closer to a 100 percent correctness in tests will require an exponential effort, there is absolutely no grounds that support that LLMs won't change in structure and training material, and that the hardware will not move from GPUs to something more specialized and intelligent. You cant really compare LLMs to something with physical limitations like transistors/CPUs. It is a bit like taking a handful of neurons and mashing them all together expecting to get a brain out of it.
My hunch is that you could run super-human AI on already existing hardware, just by improving software architectures. The chance that hardware development will flatten at a level below what will be required for super-human AI is in my view zero, even if you take the most negative view on future hardware developments. Physics does ultimately set a limit on how much intelligence you can get for a certain amount of time and energy. Intelligence itself might have a plateau; after all, once you figure out that 2 + 2 = 4, it's hard to improve on that answer. I don't think diminishing progress in computer hardware will stop AI though.
comments are: -criticizing the benchmarks -attacking his credentials -nitpicking semantics without arguing the point -"wait for -...." -accusing people of having no vision while providing no concrete examples of practical uses -comparing their ai girlfriends to the invention of the wheel
Interesting take, but I think the chart's a bit misleading. It cuts off before showing GPT-2 (Feb 2019) and GPT-3 (June 2020), where we saw gradual improvements before a big leap. The increased milestones on the right might look like diminishing returns, but they really reflect growing competition and innovation in the space. We’re seeing more players in the game and continuous advancements rather than a true plateau. The competition is pushing the boundaries, not signaling a slowdown.
We need a quantum supercomputer, and we need a new algorithm designed from the ground up for quantum computers. Research Orch OR theory by Sir Roger Penrose. If this theory is to be believed, then consciousness is because of quantum entanglement which means quantum mechanics plays a curious role. This would also mean all these benchmarks are futile since we can't even predict yet how exponentially superior quantum computers can potentially be, and by extension the AGI systems developed on quantum computers.
@@SahilP2648 Quantum computers aren't a new tech. Its like more that 12 years when it was successfully. Actually, quantum computers aren't traditional computers. They big, expensive and critical science project. I think quantum computers are really hard to work with. We all have to see what future is going to be like. LLM's are great but again they hitting their limitation not on the hardware end I would say (not at the current moment) but dealing with algorithm issue, we need to research more new algorithms. And we don't have quality data, new legislation are going to be another challenge.
I'm always amazed when someone brings up quantum computers, because quantum computers just solve 1 class of math problem we can't solve with traditional computers. Pretty certain the GPUs we use for LLMs/AI, don't depend on this math problem class or aren't limited by this specific math problem class. What I do know the specific class of math problem, GPUs obviously are the most efficient for it (when not using quantum computing), because they are the best for math. Their are obviously people working on trying to use quantum computing for AI, but that's a big research topic with no easy solution.
I had this realization recently. I've always felt closely tied to technology, especially when you're constantly having dreams and visions about the "future". What's interesting, while playing Atari 2600 40+ years ago, I had a vision of a future computer/game console that was a crystal or diamond cube that had "Mind-Fi", a literal program that played in your mind that mimicked reality. No wires, plugs, controllers, or physical storage medium. Just a 3" clear cube lol
Cell phones are a great example of tech reaching a plateau. When was the last time we had anything but incremental improvements to our phones? AI is running out of information to train on.
Ahhhh it is so validating to see o1 drop. You're all misunderstanding. AI progress is not about progress in one regime, it is a uniquely synergetic field. Looking at progress of one part of the whole does not give you an appropriate perspective on what's happening.
No shit a score out of 100 plateaus rapidly. You're completely wrong about OOM increases in computational load on these models, GPT4o was CHEAPER to train than GPT4, there was not a comparable OOM change as there was from 3.5 to 4. There are very simple reasons to argue that LLM performance might be plateauing and that it certainly will pretty soon, but this chart is not it.
The only reason we are reaching a plateau is because they're being optimized for a set of benchmarks. We need more benchmarks so the models can get good at everything
Tell that to anyone working in mechanical engineering. Every technology hits a limit. Automotive technology is a perfect example of something that has had minor improvements over a 100 years of development. Cars are not getting substantially faster. If there's a Moore's law in automotive it's measured in centuries not years.
@@katethomas1519 Actually, I think saying that mechanical engineering has neared or reached its limit is off the mark. There are still so many advancements happening, like in compliant mechanisms, which could revolutionize things by making parts lighter and more efficient. And let's not forget about new alloys that are stronger and lighter than ever, or 3D printing, which is opening up possibilities for complex designs that were impossible before. Mechanical engineering is far from reaching its limits; if anything, we’re just scratching the surface of what's possible.
@@katethomas1519 Cars aren't getting a lot faster because there's no need for them to be any faster. But companies do see a need for AI to keep improving.
super narrow perspective. CPUs single core performance has not been increasing rapidly, but energy efficiency/usage has gone down fully. CPUs no longer are used to full capacity, and you can always parallelize to more coress. When we got to 1nm, 0.5nm we will have fully efficient CPUs, that are becoming cheaper every year due to optimization in manufacturing and reduced marketing. Then it will be all about stacking more cores, utilizing better cooling and probably switching to ARM, which can give huge boosts to compute. Also most things are going to the GPU side of things, so we may soon see a shortage of need for CPUs. Finally till we plato, which will happen in 5-6 years, quantum and AI will be the main pieces. The modern computer will probably be out of usage in 10 years time tops. It's probably gonna be server based connection with a subscription, and you just having a screen, keyboard, mouse to access using satellite or 6G outside, or wifi through home.
Not to mention there is a finite amount of documented human knowledge and art to train AIs with. Increasing the capabilities of AIs will hit a wall very soon when all of Wikipedia etc has been used.
@@merefield2585 that's my point, all models probably already have all of Wikipedia and are stuck waiting for humans to make changes and add more pages. AIs can't really come up with their own ideas (and judge their value) yet so they are forced to wait on humans telling them what is correct and not (and we can't even agree most of the time lol)
@@KarlOlofsson We wouldn't want that tho. AI is a tool and we are the ones taking the decisions. Why would training with Wikipedia already be a problem? If you ask a human to learn something, and he reads all the possible knowledge about it, why would that be a problem?
@@waltercapa5265 because you can't call AI intelligent if it can't figure things out for itself, then it's just a glorified search engine or robot. I call them ML-bots. They have to become self learning to not be dependent on human bottlenecks but that will likely never happen. AIs will likely become very simple assistans but nothing more.
3:48 I got a secondhand M1 (with the minimum memory and storage) from FB marketplace for £200, not including shipping >:) It has a small aesthetic defect but works completely fine! I felt so proud of that deal.
I could be wrong, but I dont think apple had much to do with big little cpu architecture. ARM have had that in their designs for over a decade now. Additonally, the idea of having specific chips that handle specific tasks has been in use by CPU manufacturers for a long time. For example Intel released their Quick Sync technology in 2011 to improve video related operations. So though Apple has this too, its not something Apple has made mainstream or had to "bet" on, its very proven and very common in all cpus from desktops to phones, they are just more abundant in ARM devices due to their reduced instruction set compared to x86
@@luckylanno The problem solving aspect. Ability to figure out effective solutions to novel problems (Not seen anywhere before). I think a good place to start for this is the Millennium Prize Problems. AI is allowed to research, learn appropriate problem-solving techniques on its own, but it must solve the problem at hand.
“AI isn't gonna keep improving” he talks about AI reaching a plateau. That makes no sense in this context. It's funny to me that his own starting example proves why his statement is nonsense. AI labs aren't locked in to current designs. They already know its limitations well. And if they find out it won't lead to AGI they will switch directions without hesitation. They Don’t care about trends or predictions based on what already exists because they have always been at the limit of what is known to be possible. The top AI labs goal has always been to exceed what we know to be possible to achieve their goal. We know AGI is physically possible. We know it's a thing that CAN be done. We have existing examples. But we don’t know HOW it can be done. If we just cared about making one AGI we would just have a kid. Or adopt a pet. We already have intelligent systems we can raise and use. The quest he asks that makes me think he's dumb is if we can improve. I can bet my life on “yes” without hesitation. The goal is achievable. The question in the air that he should have been asking is can we achieve the goal in a timely and cost effective manner. That is what the problem is. In real life right now, if labs hit a roadblock it's their job to keep going. And that's how we make progress.
“Oh no we hit a roadblock so I guess we should just give up.” Said no one ever. They will keep going until they get to their goal or run out of money trying.
The reason computers keep getting faster isn't because of some natural force. It got faster because people made it faster. People for various reasons wanted computers to be better at computing, and they have continued to make the breakthroughs necessary to complete that goal. The reason computer architecture continues to change is because the architecture doesn't matter, the goal does. If current computer architecture does not fit our goals then we change it. If current AI systems don’t work, we will simply change it. It wont plateau not because of the existing technology but because of the effort of those working towards their goal.
That's a falacy. Any Computer Scientist who really studied Operating Systems and software evolution understand that we're far off from PROVING we've reached a plateau. My point is - there is absolutely NOTHING that indicates that we've reached a Plateau in software intelligence development. We're just seeing it narrow. The LLM's models, which are purely based on words (strings) might have reached a plateau. Specifically because the theoretical model - transformers - might have reached a plateau. But there are tons of other Machine Learning in the works, not only in language processing but in robotics as well. It's just too silly to think that AI has reached any kind of plateau. We just got started.
The reason for the AI plateau is how bad the major AI Developers are. For example, when it comes to pre-trained Transformer LLMS, there haven't been any real, major advancements since GPT-2. They just keep throwing more resources, and additional side gimmicks, at the existing concept. These create an illusion of progress, but there is no new technology or theory involved. Open AI is the worst thing in machine learning technology, but unfortunately is also the richest, because they've sold out everything they used to pretend to stand for. They are secretive, proprietary, and lazy. They don't produce new ideas or technology, they just keep shoveling money at old technologies.
"For example, when it comes to pre-trained Transformer LLMS, there haven't been any real, major advancements since GPT-2. They just keep throwing more resources, and additional side gimmicks, at the existing concept." That just isn't true. RAG, for example, is a significant innovation, not a gimmick. The capacity scale context windows of LMs from 512/1024 (SOTA GPT-2 at the time) to 4096 on every LM and 1M+ on new big ones is a major innovation (likely involving transformer-alternative architectures). Multimodal LMs are an incredible advancement. And last but certainly not least, the ability to make LMs 500-1000x the size of the largest GPT-2 version, that took innovation in compute resources and model training efficiency. That was all in 5 years. There's no illusion in that progress.
@@dg223-p1r Wow, they are SO corrupt that they just deleted my rebuttal, which is all facts about how every single part of what you described is nothing but a plugin or expanded resources on what is, essentially, GPT 2. They are criminals.
@@dg223-p1r Wow, the criminals keep deleting my entirely-EULA-compliant response. Everything you list is exactly the kind of gimmicks I am talking about. None of them are an advancement of the model.
@@KAZVorpal I'm curious what innovation you feel took place in 2019 with GPT-2. The transformer came out 14 months earlier, Dec. 2017. What was so great about GPT-2 that it counts as an innovation on an architecture but nothing since does?
@@dg223-p1r You've got me there. GPT 2 was the first implementation that worked well, but that's more a Turing Test thing than an innovation in architecture. I was aware of this, but trying to start with the one that fundamentally behaved like the modern version. So, really, OpenAI has created nothing at all, beyond being the most successful implementation of Vaswani's Attention is All You Need. But, again, I felt that is a separate discussion, what I was focusing on is GPT 2 being what feels like the same thing as the modern product.
Autonomous cars are NOT far from functioning, they are driving people around San Francisco. They are here now, hence they are on their way up the second bump in the hype cycle (the realisation phase)
They got forward training based optical modules now. That goes up in energy efficiency and cores speed for a while and multiple colours going through the same circuit. To train something with a trillion parameters a few exa flops will do but if you want to train a quadrillion parameter model you need 1000 yota flops.
@Marduk401 LLM are improved with high-quality data. What do you think happens when they run out of high-quality data, huh? Feeding LLMs their own output doesn't really work.
@@realkyunu there are a bunch of ways to solve this, and the training data limit is unlikely to be reached ever. Do you understand how big we are talking about? It's like saying you'll drink the ocean and leave earth to dry. But instead of getting technical I'll tell you this again. You think that the horse will never get replaced by the car judging from a technology that has been popular for the last 5 years? Yes people said the same things about the car, the tv, the freezer and every technology that ever existed. Give it 5 more years at best. The fact that you think this is gonna somehow stop because of an early suspected limitation like training data ending (so vast that it's currently impossible) Ask yourself this. when in the history of mankind has anything like that just "stopped"?
@Marduk401 "Training data limit is unlikely to be reached ever." When it comes to totally useless data, you are absolutely right. That trash is produced on an incredible rate on the internet. But humans cannot produce enough >high-quality< data to be consumed by the LLMs in order to keep up that rapid growth. ChatGPT had a lot of air up at the beginning, but this will end sooner or later. Maybe not in this or the next year, but after that? But I am curious: How are they gonna train the LLMs after the "end" of tons of high-quality training data? If they have a method to endlessly train their "AI" otherwise, I am totally with you. If they don't have a method to train their LLMs infinitely without using high-quality training data, then those things will replace shit.
Dude, when someone completely develops a "video to video" AI that would be PLENTY to change the entertainment industry for decades to come! All AI art efforts should focus exclusively on that. It's been years since Ebsynth started to develop that.
Yeeee, i love it when random swe make videos about AI. Unless you are a data scientist, ml engineer, or researcher, your opinions mean less than nothing
One of the biggest things that will change how fast things can be is the semiconductor material. Everyone keeps trying to make silicon faster, but we've already found better options.
Calling Gordon Moore a "dev and hardware enthusiast" is hilarious. Dude literally founded Intel
Yeah, this guy is completely clueless. Like saying the internet won't get better in 1997.
@@diamondlion47Thats why i keep on saying frontend devs are not devs💀🌚😅
Down right disrespectful😅, my lord🤦
Moore might have been able to design and build Intel 14900k with his bare hands and a piece of paper, but he was not able to learn the latest piece of shit frontend framework in 3 days after it was released in 4chan
@@warrenarnoldmusic stay mad
Gordon Moore was not a “Dev”. He was the co-founder of Fairchild Semiconductor and Intel (and former CEO of the Intel).
He was an engineer first, so that’s probably where the confusion was from. Definitely not some rando.
@@jonnyeh yes, but he made the “law” as CEO of Intel.
No, he's the only dev, he actually developed something. Because photolithography is developing a chemical to produce a pattern.
yeah some random guy lmaooo just made a guess lmaooo just an uninformed shot in the dark just lucky
I like Theo. But that was one of the most boneheaded things I’ve heard him say.
"Moore - a dev and hardware enthusiast"
-- Theo
That is... technically correct
The best type of correct 😂
@@DelkorYT I thought, if we factor in multicore chips, that Moore's law is holding up pretty well, is this not true?
@@billywhite1403 Yep. The main problem with Moore's law is clock speeds, power consumption, and die/process manufacturing size (excluding design problems like clock timing amongst other things). All those 3 things combined will slowly kill transistor count.
@@mirakekkle9476 I'm not saying you're wrong, but I do think they were saying the same sort of thing thing in like 1987 (Just to pick a random year before things got really digitized, aka before a paradigm shift in chip design, aka we could be standing on the threshold and not know it. I'm sure there is a limit on physical transistors. But there are so many other ways to indicate 0 and 1, especially on the particle scale, I can imagine we figure that out within the next few decades
@@mirakekkle9476 those issues are explained by the breakdown of Denard scaling which happened in the mid 2000s.
The problem as always is when you have a 99% reliable system and you want a 99.9% reliable model. The .9% difference is 10x more than anything else
Yeah, because that's a perception problem. We won't know the actual percentage until we reajust the scale once our knowledge is increased.
What are you on? Benchmarks should never be in percentages. Do you see CPU and GPU benchmarks in percentages? No because we don't know the upper limit so we can't logically create percentages. Same with LLMs and the like. For whatever reason people are using percentages, but that doesn't mean it's correct. For example, we have an IQ system with a max of 200. But we decided that with some parameters in mind. But there could be conscious entities, non-humans, which could have an IQ of 1 billion. They could do all the computation of humanity till date in 1 sec. We just don't know.
You saw NeetCode's vid too huh?
@@SahilP2648 True, but those percentages are according to benchmarks. Also benchmarks should be adding more complex tests while the LLMs are improving.
Yup. Gemma 2 2b (2.6b actually) is really good at translation, but when it struggles, I go straight to Mistral Large 2, which is more that 47x its size.
Calling Gordon Moore a dev/hardware enthusiast would’ve been funny, if it was intended as a joke
i mean its that mean like marines vs stick, where they dumb down some ones credicials for meme
He was a dev. Deal wit it fool.
8:50
that is absolutely the wrong way to look at those graphics. that isn't a chart with uncapped ceiling, it tops at 100. so if we invert the data to show "mistakes" (just a way of saying) instead of the score, you'll notice that jumping from gpt-4 turbo1 to gpt-4o, which in the graph is roughly 74 to 80 (or a "8% improvement), is actually a 23% reduction in mistakes if you notice that it went from failing 26% of the time to just 20. and reliability in a system is not something that is valued linearly. a system with only 50% accuracy is as good as useless, while something approaching 90% starts to get actually useful.
According to that graph, gpt-3.5 is probably around 57 score. compared to gpt-4's score of roughly 67, that was a reduction of only 15%. so gpt4t to gpt4o was a bigger jump in reliability.
This makes a whole lot sense! Great explanation!
Same as uptime. Going from 99% availability to 99.9% is not a 0.9% improvement, but rather a 90% reduction of downtime.
not to mention the massive speed improvements and cost reduction for using these models within a year's time. Not sure how much of that can be placed on other factors (infrastructure, funding..) but Microsoft has been backing openAI for a while now so I'll put it down on the models improving (though it could be the increasing competition..). I think it is the models that have been made significantly faster and cheaper than before.
yeah this dude started with a conclusion and tried to find arguments to support it.
I'm definitely "worried" about a slowdown, but this video actually made me slightly less worried lol
This claim of 'smaller and smaller wins' also assumes that these benchmarks scale linearly, which likely isn't true. It's possible that while 70% of the questions are easy to answer, each subsequent 5% is significantly harder, than the last and require more training. So a 'small' n% gain in performance GPT-4o gains over GPT-4 may actually measure a significant jump over a subset of very difficult questions.
As we reach higher benchmark scores, you have to flip your understanding of them. If accuracy goes from 80% to 90%, that feels like a 10% improvement, but in reality the error rate has gone down by half, which is basically a 100% improvement.
You also forgot to mention how extremly flawed, early-stage and biased the ai tests are that are supposed to measure their performance. It is way too abstract to just slap a number on it. You dont say "this guy has 76 intelligence" or "this guy is 13.2% smarter than the other guy". It's artifical intelligence, intelligence, the thing we have. So as hard as it is to accurately measure our own true intelligence as difficult it will be to measure ai intelligence, but that doesn't it plateaued and it doesnt mean the improvements wont be noticeable. Its honestly laughable to think that the top-notch scientific masterminds of this generation and the most valueable companies in the world are all wrong and wasting money, because a guy on youtube read a graph and said "see? It's plateauing!"
Thank you, I would have needed to write that comment instead, if it were not for you.
I totally understand that a lot of people don't like "AI" (what they actually mean is generative AI), because it became a symbol for everything that is wrong with big tech, just as Glyphosate became a symbol for everything that is wrong with industrial farming. Yet I'm baffled how even tech-savvy people only focus on the results not being completely perfect, while ignoring how much of an achievement it already is to get almost-perfect results. We have computers that easily pass the Turing test.
@@BigSources You might want to look into survivorship bias. Plenty of big companies have invested mind-numbing amounts of money into things that never panned out. It's the nature of the game.
a much better way to say the same thing i tried to. nice.
corrections:
3:00 compares SINGLE thread with an ever growing number of gpu cores lol. i find this graphic highly misleading. peak operations per second continue to follow moores law even on CPUs. people said moores law is plateau-ing for decades now yet it hardly slows down. the reason why you dont notice a difference between the 10700 and the 14900 is because both are "good enough" for normal every day tasks. however in productivity workloads like cinebench 2024 the 10700k scores around 740 points while the 14900k scores 2180. thats almost 3 times as much.
6:04 no, you can run, look at and modify any llama model code. its on their public github repo. you still havent learned that after your bad rant about meta. mistral on the other hand only provides the executable, not the code. difference between open weight (mistral) and open source (meta).
7:30 apple certainly didnt invent hardware accelerators or the the big-little architecture. also neither of that wouldnt show up on the funky graph you showed anyway because it only includes single core performance.
8:10 that is probably one of the worst charts you could have picked to show the slowing down of progress though i do somewhat agree with you.
He's a web developer, come on , they live inside a browser, which sucks all the gains in performance by being bad tech.
@@monad_tcpbro is cooking, even smallest Nextjs(which Theo loves) app uses 400MiB memory.
I'm actually getting pretty sick of how under-researched his videos are. I noticed this with the Llama 3.1 video where he absolutely butchered what the model size means.
The graph at 3:00 is classic NVidia shenanigans. These guys are totally incapable to ever produce a graph that's both correct and not misleading. And useful.
Other than that, yeah, good corrections. There's even more minor corrections, if someone would be pedantic.
I think that Theo deep down knows he's not accurate / properly correct, but overall he gets the big idea pretty well.
Edit: one thing that I can give Theo is that he does listen to the suggestions and corrections, well, as much as he can take in a limited amount of time. You can see in this video that when he showed the fast inverse square root he didn't say that John Carmack invented it. That's an improvement on his part.
Underrated comment
small factual correction: "One of the crazy things apple invented, was the idea of having different cores with different roles"... No they didn't, it was actually ARM. Hetereogeneous Computing strategies and Big/Little architectures were not invented by apple :)
they also did not invent the ideo of doing video encoding in hardware, idk where he got that from.
@@heidgunther4060 Apple users thinking Apple implementing a feature = Apple invented the feature is a very common thing.
Yeah, I don't know if ARM was first, but ARM started doing this in 2011.
@@celloninja While that may be a thing, apple does "perfect implementation" of ideas more often than not. They have top notch engineers working there.
@@celloninjayou absolutely nailed it hahaha
It feels less like LLMs have plateaued and more like the benchmarks are all being gamed and optimized for. Claude 3.5 sonnet, for example, is a cut above all other models.
it's not so drastically better that it doesn't belong in the same conversation. The amount of "I apologize, you're absolutely right" I get from Claude doing the typical LLM mistakes almost reads like a parody sometimes.
@@PraiseYeezus I think what shows this more is what happened with chatgpt. GPT-4o is drastically dumber than GPT-4, yet it's somehow considered a giant upgrade according to the published metrics.
I don't understand what you mean by benchmarks being "gamed" and optimized for? For what? AI/ML will plateau if it hasn't already. This is unavoidable. Maybe there will be small improvements in the same sense hardware plateaued a long time ago and there are only tiny improvements. I realize manufacturers advertise their new hardware as some massive upgrade, it's not. Maybe they'll do the same for AI models. Case in point, I still run a 2012 pc. Long gone are the days where a 286 was more like 1000 (that is one thousand) times faster than a 8086 and the difference was unreal compared to an not so older say 6510 or say a Z80. Now you might get point clock increments and maybe some more cores to compensate. The same thing will happen with AI programs, those who envision singularities are fools. Whether those singularities are AI or black holes, they all be fools imnsho. i.imgflip.com/3pvz1p.jpg
@@marinepower Absolutly perfect example of goodharts law!
@@marinepower yeah, even GPT 4 Turbo is clearly better
I disagree with the interoperation of graph at 8:39 , Its a benchmark score out of 100. It will always asymptote and isn't analogous to a TOPS or Transistor count graphs.
To see a real asymptote we would want a harder benchmark where we start at lik 0-20% performance, go up to like 50-60% with newer models but stop improving there and well away from a human expert score on the same benchmark
And also time of the release on x axis dont measure how much data were those models trained on nor how much computing power they needed. If there was any plateau it wouldn't be on potential growth of GenAI with bigger models
I don't know what people expect from an exponential growth, 100% then 200% and so on ?
@@LucyAGI we expect all the things tech bros promised when they were hyping chat gpt when it first appeared. Now that it does not deliver , you all cope and become wordcels to justify the failure .
@@LucyAGI Goal is, make an AI smart and general enough to where it can perform the tasks of a Machine learning expert and work on ai itself, if you can inference a lot of them, then you have 1000 experts (or more) making ai smarter, more efficient so on, thats where the exponential would kick in (theoretically, we dont know where the ceiling on intelligence is so the "exponential" could be short-lived)
When we'll get there is questionable. *maybe* we'll get there with LLM based systems, i believe we'll see llms being used as a weak reasoning engines in a system of tools, verifiers and agents within the next year, possible this falls through though and that llms scaling doesnt make them any better meaning we hit a hard stop ceiling and need to find another architecture alltogether but imo, thats unlikely as of right now / too early to say (as we havent made the aforementioned tools yet)
Yeah came here to post this, agree completely that all benchmarks would look like the scores are flattening as they reach the top.
Also with benchmarks like MMLU they've been through carefully and found a bunch of the questions have issues with them such as no right answer, multiple right answers, missing information in the question etc, which means that no system will get above a certain level.
Our brains are proof that computing density and efficiency have a LONG way to go before they hit a wall
Good point
This aged well.
Aren't the LLM benchmark score here in %? Thus of course there was always going to be a plateau at 100% anyway ... 🤔
Yes, I was thinking the same. Although, the issue of not having a better metric (measuring stick) is still true.
Ask yourself why percentages lmao. It doesn't make any sense.
It's the difference between 99% and 99.99% being a 100x improvement despite only being " about 1%".
@@jmd448 Probably better benchmarks will come out as both performance and expectations evolve.
The real issue here is that there are no cold hard metrics to compare, even comparing numbers of parameters isn't exactly equivalent to number of transistors in a chip...
@@LiveType Bullseye. Its a 10000% improvement if my math is right better stated as 100x though
Just to note, LLMs are the current focus and are unparalleled for natural language processing, but even if LLMs do plateau, I really do think there is further research + neural net architectures that will give us another boost forward in AI progress, over time I can def see multiple 'model architectures' working in tandem to complete complex work.
So basically, I think even if hardware compute advancements are slowing, progress and research into the fundamental technology is accelerating, and I hope we will discover breakthroughs which allow us to derive more intelligence from less raw compute. Yes, etched and others are working on altering the hardware architecture to fit neural nets, but there is much to be said to iterating neural net architectures to utilize raw compute orders of magnitude more efficiently.
well they kind of have already .... Llama 3 8B is really very good considering the amount of compute required. GPT 4o-mini is exceptional considering it is likely a tiny fraction of the size of its big brother. But the entire "scale to improve" thing seems like a hiding to nothing and incredibly inefficient. We need another architecture level eureka.
No all large models convert to be the same.
Well this aged well...
This happened with re-enforcement learning too.
The models had so many nodes that back propagation had virtually no effect, meaning they became too big to train any more and even got worse with more training.
Introduction: AI Plateau Hypothesis (00:00:00)
Moore's Law Explained (00:00:29)
Decline of Moore's Law (00:01:59)
GPU vs CPU Performance Trends (00:04:28)
Analog AI Chips and Specialized Compute (00:05:24)
Rise of Open-Source AI Models (00:06:18)
Potential Plateau in AI Model Improvements (00:07:20)
The Bitter Lesson: Focus on General Methods (00:09:53)
Importance of Algorithmic Innovations (00:12:33)
Future of AI: New Architectures and Methods (00:14:33)
Hype Cycle for Artificial Intelligence (00:15:30)
The ARC Prize and AGI Benchmarking (00:18:56)
Conclusion: Future AI Beyond LLMs (00:21:42)
You really misread that graph around 17:00... it's not a progression of a single technology across the hype cycle with every point being a further refinement that brings it further along the cycle, but rather each point is an independent technology that currently exists at some point of its own hype cycle.
Yeah, he skipped over the key legend at the bottom and missed the point of that graphic
7:34 Isn't it big-little idea from ARM? What about prior ARM?
It is but we all know Theo's biases towards Apple...
@@Garcia98 Apple doesn't even make CPUs , they only optimize the design of the ARM.
It is. Though it is from 2011, and Apple was using ARM for the iPhones since 2009, right ? They could've had some input on that, though the wikipedia article about big.LITTLE doesn't say anything about Apple in the initial development.
i thought he was rage-baiting, but hes just biased
Apple's M series are completely custom designs that use the ARM ISA. But they're significantly different from, say, a Cortex, and contain significant extensions like an option to use a stricter x86 style memory model to make Rosetta2 work at near native x86 speed.
I don't get why some people hate AI so much
Some rebuttals here.
First, I am an Engineer at Hugging Face, working on generative modeling, which, to me, is not JUST language models. It also include {image,video,speech}-generation models, too. I am mentioning it at the beginning of my comment to let others know that I am not a podcaster nor an influencer who doesn't know his shit.
1> Too bad that you only cared about compute and that too only from a single provider i.e., NVIDIA. TPUs have been there for a long time and has a moderate share of the market. They don't have the problems that NVIDIA GPUs have such as shared memory, availability, etc. It's a completely different stack and when done correctly, can be faster than GPUs. TPUs are, of course, primarily used by Google to train and serve their flagship models. But other companies such as Apple, Midjourney, etc. have been using them as well.
2> You only showcased public benchmarks but in reality, a business that is remotely serious about integrating "AI stuff" will have internal and private benchmarks which will keep on evolving. Also FWIW, in the domain of {image,video}-generation none of the models exhibit the performance ceiling yet. Let's not discard those models as they are quite business-friendly.
3> Model architectures and modalities exhibit different properties and hence, their dynamics are different. For example, Transformers lack inductive priors as CNNs, so, they usually need more data to counter that. Then, you have LLMs which tend to be memory-bound, whereas diffusion models tend to be both compute and memory bound. This leads to different engineering practices.
4> 3. brings me to this point. We have seen technological advancements across all facets -- data, architecture, pre-training, optimization, etc. Not all these axes have been fully exhausted. As long as there is a good dynamic in any one of them, things will keep improving is what I believe.
All your points are very much true, but you might be too hard on him. He's just a RUclipsr.
@@dirremoire A RUclipsr with a huge audience, which leads to influence and power over his audience. Every content creator should be aware that the mistakes they make in their videos are going to be parroted from then on until someone with a similar audience size can disprove them - but by then, the damage was already done.
I’m just a regular person, not a tech or engineering guy-I had to look up to see what TPUs were, for example-and your comment was very helpful and informative. Thanks!
Clearly the things that you cite are why Kurzweil doesn't use Moore's Law as the basis for his projections. He uses "computations per second per dollar", which is tech agnostic.
I work with LLM APIs regularly and am convinced that they are great with Kahneman's System 1 thinking, and once System 2 is brought online, the floodgates will open for widespread use.
He is right on a few things, that is, Theo. We consistently over reach our expectations whenever technology moves forward and pretend that this time is different. Your response here is clearly another example of that as you have not really rebutted anything so much as chosen cavoites. It does make sense from a point of view for sure, that is; you are in the thick of things working on it every day, your likelihood to be overly optimistic about what you work on is high, meanwhile a guy like Theo is dealing with a double whammy of obsolescence, not only his ability to be an influencer is under threat by content creation, his primary skill as a developer is also under threat. My point here is that you both have an extreme bias towards the situation which is part and parcel why you responded at all.
For Number 1; Not saying that TPU's aren't a better platform but this is probably more akin to comparing ARM To x86 architectures. ARM has been "better for a decade". But why isn't it fully adopted still? Exactly. You want to just ignore that entirely.
I feel like number 2 is really just changing the discussion.
3 and 4, I guess time will tell.
I personally have noticed that the expectations have gone up on me dramatically as a programmer, and as I use these tools, I often find that they still are unable to do basic anything and it requires so much work to create a final solution that I just would have been best to type it out myself and mentally break down the problem myself. AI is good as a text processor to help me manage and maintain lists and do mass text manipulation, its great at helping inspire me towards choosing a direction, but its ultimately terrible at doing anything really intelligent.
For sure we are slowing down, I think even you are agreeing with that, but there is no reason we are going to slow down forever, it may speed up again; trying to predict this is a bit insane.
LLM problem: The set of all plausible (but incorrect) outputs grows exponentially faster than the set of all correct outputs.
Even amidst authoritative data sources used during training, not all factually accurate data is equally true or helpful given context. And context is something that LLMs struggle with and sometimes hyper fixate upon.
For example: “The sky is blue.” True, but… is it as true as “The sky is sky blue”? What about “the cloudless sky at mid-day has the optical wavelength: 470nm”?
That’s not actually a problem. That was a silly little concept that Yann LeCun postulated and which has been proven completely wrong; this does not cause problems for LLMs at all.
@@therainman7777 Citation?
@@kurt7020 simple solution: feed correct data only
@@jimmyn8574 Generally if the solution is simple and obvious - you don't understand the problem. I have found this to be true of almost everything in code and life in general. Nothing is every easy. Learn. Try hard. Be kind. Best of luck!
The message of the video is correct, but I can tell you have no idea of what you're talking about, you're just making wrong connections and interpretations of what's in front of you.
All scientism is like that
The problem with Moore's Law is it became Moore's Yearly milestone. I'd argue once they realized the trend they would withhold performance knowing putting too much into a gen will guarantee a very small bump in next gen because we are reaching physics barriers where transistors are getting too close to isolate their signals / have them behave properly and hold up over time.
Intel did that with their 14nm .
That's now why they have competition from TSMC and Samsung.
They're going to eat that and have to sell all their factories.
Their production methods are still behind others.
They thought they had a monopoly and were able to keep the next steps and amortize it for the next Gen .
Didn't work, their competitors went ahead to the next nodes.
But even them are now struggling with their fake 3nm. There's no real 3nm.
It's over. CMOS is done, that mine won't put out more diamonds.
And any other method ironically wasn't researched because it would be a step down to performance levels of 1990 until 5 generations later, finally it out paces CMOS, that would be 2030.
They had to have stopped putting everything on CMOS tech in 2015.
It's too late to keep the Moore's law now. Its dead.
@@monad_tcp Yep I can't say I'm surprised what's happening to Intel is happening. You want to be the master of the clock, but it's becoming a honed samauri that's recently fading vs a machine gun. Intel just kept pushing smaller gains for bigger costs.
I moved desktops where I could to AMD. The power efficiency savings especially heavy load over time made it easy.
I started using macbooks once apple had the mac silica ARM. Late Intel chips were toasters in that aluminum body == recipe for early death. It was also stupid for mobile-use. I it to last long and not worry about "oh no turn off bluetooth, don't play spotify, dont have youtube going". I borrowed a friend's X1 carbon and it lasted like 2.5-3 hrs of real use with its Intel i9 vs easy 8 hrs from the mac.
I'll want to see a few good gens in a row to consider Intel again unless really really really is best use-case.
We would get photonics-based processors and maybe IGZO as the material change. Shunpei Yoshida the original creator of the LCD said IGZO is a really good candidate, offering upto 100x speeds with the same manufacturing and architecture.
@@monad_tcp Intel's mistake was thinking they could do 10nm without EUV. They sort of did but 10nm only shipped thanks to EUV.
@@monad_tcp Aren't we shifting to GAA transistors right about now?
>Moore's law is dead (again)
advanced packaging says "lol"
The AI model release chart doesn't look like it's plateaued yet as Theo stated. It still looks like it's still linearly progressing in terms of benchmark scores relative to time for each of the three models.
if you give a gpt double the compute, the improvement is the same forever due to scaling laws
A language model can read all the websites and books it wants to about how to ride a bike, but unless it actually has to control a bike and perform specific tasks it will suck at them, it will never be an expert on riding a bike. This is the fundamental flaw with current LLM models, they're fancy search engines (they have a use - but also a limit)
@@entropiceffect No if an LLM is big enough according to transformer scaling laws, its outputs will be undistinguishable from its inputs, so yeah a 405B model wont be able to but a 405T model will be able to. For example, if an LLM did have to ride a bike you'd give it a picture of the bike your using and than use an LLM to predict body movements, right? Well, since it has learned so much about bikes in its training data and because it is so large and can generalize so well it could do the task with ease. We never though an LLM could reason or simulate the world but when we
scaled it up it could do it.
This video literally stated that transformers wouldn't get better anymore even though all the newest papers show completely linear improvements eg llama 3 paper, this video was just complete misinformation
@@exponentialXPIt’s not the same forever; scaling all plateau eventually. But, we’re not there yet and he’s wrong to say we’ve plateaued. He made a number of mistakes in this video.
@@entropiceffectNo, that’s simply not true. You’re talking about tacit knowledge as opposed to explicit knowledge, but you don’t need tacit knowledge to be an expert at something; you only need tacit knowledge to actually _do_ that thing. Since we don’t need LLMs to actually ride around on bikes, they don’t need a that tacit knowledge. But LLMs can easily be experts on how to ride bikes (and many already are).
Lol Apple didn't invent (or first use in a chip) efficiency cores or video decoder cores.
Apple didn't invent a single thing to be more exact
8:35 i feel like this is missing that each of the models on the GPT4 line are getting smaller and cheeper; they are not intended to be smarter they just happen to be, the intent is to get the thing to run on a phone.
regarding improvements over time - as Nate B Jones pointed out on his channel recently, we're measuring progress incorrectly for where we're at on that curve. while going from 90% to 95% in a benchmark looks like a small improvement, this is actually a whopping 2x improvement *in error rate* which is what we're trying to reduce now. when you're measuring progress towards a limited goal of 100%, progress is going to look slow towards the end, but that probably just means we need new benchmarks and different ways to evaluate the results.
with regards to the ARC-AGI test: I don't understand how or why this is a test of language models - it's not a language problem, as far as I can tell? and I'd expect, if there's an AI (not an LLM) that can teach itself to play Go, the same AI would be able to figure out the simple rules of this "game"? so this just looks like a simple case of "wrong tool for the job"? I wouldn't expect a language model to be good at this, why would it be?
Yet if we are 10-5% away from 100% and if 100% is supposed to be AGI, we are nowhere close yet. Using percentages in benchmarks is so stupid.
@@SahilP2648 there's no real agreement on the exact definition of AGI, as far as I'm aware. but I don't think it's merely about perfecting current benchmarks.
The most 'general thinking' AI systems we have are LLMs
I think the idea of the ARC-AGI test is to try and get the LLMs to do logic and maybe math, because that's the part the LLMs are bad at. Or maybe they want to have a good way for LLMs to outsource it to a module which can do it correctly. Any solution would do.
@@RasmusSchultz when a model can derive new maths and physics equations like 20th century scientists, we have achieved AGI. That's a hallmark test. Doing this currently requires human intelligence and machines can't help, only experimentally. Theoretical physics is bound by human intelligence.
@@SahilP2648 this just doesn't sound like something a language model can do, at least not in their current form, being essentially just next word prediction. as you said, maybe through tool use, but what kind of tools? we already let them run Python code. I don't know. we'll see I guess. I'm a lot more skeptical than I was 3 months ago, because I don't feel like we're seeing the same breakthroughs, or even promises of breakthroughs. but we'll see 🙂
The reason performance is plateuing is because the scores are getting close to the maximum value of 100%. You obviously can't increase by 30%+ once you get to 90%. This is the S-curve phenomenon.
I think the whole point of singularity isn't that each specific component will get infinitely better, it's that that component will help us discover new components in which those components will help us discover new ones. And the rate of speed between accelerating quicker and quicker.
Just as agriculture was a component that lead to the industrial revolution
Then the industrial revolution to electricity
Electricity to the computer age
Computer age to internet
Internet to AI
AI to ...
And all of these things had sub branches. Computers helping with simulations. Electricity to new forms of motors etc...
Yeah big things are always built on iteration. There will be more advancements, we (obviously) just don't know what/how yet. as it always has been
Gordon Moore “Some Dev” 😂
Gordon Ramsay "some cook"
@@potato-yx5te That likewise is a terribly inadequate parallel...but at least they are both named Gordon. 😜
@@YadraVoat I agree haha, im sorry as well.
He was a dev. Deal wit it fool. Don't cry.
@@mattolivier1835 don’t be a clown here
you have no idea what you are talking about, turbo is a smaller faster version of gpt 4, of course it wouldnt be better, it was a leap for it to have clkose to the same performance, and 4o was an even smaller model that is multimodel, so it can input and output text, image and audio all at once, which has never been done before at that performance level
Bro there are tons of experts saying that we are plateauing, and I have been saying Ai was going to plateau. like, literally, if you understood, how computers work. it would be really hard for you. to try to rationalize. how a. I wasn't going to plateau.
@@EDM-Jigger there is a relationship between loss and compute, basically we are bottlenecked by compute, bear in mind current AI models are less than 0.1% of the size of one human brain, and our architectures arent as efficient as the brain. We are still very early on in AI and it wont plateau because we will keep finding ways to improve models
@@incription Yes, and or no, I'm going to get any stronger underneath a classical compute logic. How would it if it's 0.1 percent the size of the human in. a 4.7 gigahertz. processing core is only capable. of completing 4.7 billion calculations a second. and we can't cramt transistor zone any smaller. without having an electrical current jump to the next transistor creating an ecc error. in the L-3 cache of your CPU. How on Earth do you think we're going to get that 4 billion into the trillions? underneath this classical compute logic. of true and false which stands for one. or zero.
this video is gonna age like milk
🍓
already has lol
Thing AI advocates always miss, is that the growth of these models is rapidly constrained by compute and energy resources, those exponential curves assume society decides that AI is a good place to direct all those resources
And nvidias ceiling (money grab) on vram. My hope is Mozilla’s llamafile gets use closer to at least being able to offload more to cpu and ram.
Hubris. I tried to say to some of these enthusiasts how the electricity consumption by these models is not very viable. The only response I got, 'hur dur, it will get more efficient'. As someone from a third-world, I understand there far more important things electricity is needed for than text and image generations.
@@thegrumpydeveloperit always make me angry. Why don't we have an ATX standard for GPUs to decouple everything , why do we need to buy RAM from Nvidia at extortion prices for the same shit everyone uses in their main memory.
VRAM is RAM, the only difference is that your CPU bus is 512bits (4 channels of 64bits) and GPUs are 4096bits and they have a couple of extra lines for synchronization as their access is less random.
But it's roughly the same shit, witn small changes in the memory controller.
Always remember that exponential curves IRL always turn into logistics curves once they hit some kind of limiting factor.
@@3_smh_3 they don't care it cost $1/h to use GPUs and when the fake marketing idea of AI is to replace humans with minimum wage of $15/h.
Capitalists salivate at that thought.
No one is looking at countries that earn less than that, or where energy is expensive.
Also third world countries are corrupt, their energy cost is usually way, way less than rich countries, it's all taxes on energy, that's why they're poor. The energy cost is actually not higher because there's little demand.
Which is ironical as the data labeling used to make AI is probably $0.2/h for human actually doing the work.
AI is basically just outsourcing, but in time instead of space.
Turbo and Omni models aren't about scaling up. They already said after GPT-4 was trained that their plan was to first make them efficient and multimodal before scaling up again, since that was far more important and sustainable. I'll wait for actual data on the next model up before I make assertions that require it.
the whole point of the architecture behind chatgpt is that it scales completely log(compute) linearly, it will never plateau
Give a gpt double the compute as much as you want and th3 improvem3nt is the same
"everything that can be invented has been invented." Charles H. Duell, 1981
I hope no one took him seriously after that second again
The current models need to be scaled up. We're nowhere close to hitting the "physics" equivalent for AI
Except they're getting smaller as time goes on now. Smaller yet more efficient. If rumors end up being true, gpt-4o mini is something like 8 billion parameters.
@@timsell8751it’s both. The largest models are getting larger and the smaller models are catching up quick to the previous generation of the larger models
@@timsell8751 that isn't an argument against, if the newer traning techniques are applied to the larger models we'll see a jump. They haven't been applied coz it's probably expensive to retrain the larger ones. If anything shows that the more efficient techniques will create an even bigger step up for gptv5
Takes like these are going to age like such horrid milk lmao
“we are in Moore’s Law squared” - Jensen
ai is beginning to synthesize data and creating its own ai models that will create its own synthesized data and ai models based on what taught it, etc. in a positive loop 🔁. (This is September 22, 2024 today).
Eventually this will grow at such an automated pace that we will be living with AGI then ASI.
it’s kinda hard to imagine what things will be like even by September 2025.
Apple M1 single thread geekbench 6 score = ~2400
Apple M4 single thread geekbench 6 score = ~3700
So 54% improvement in 3 years. ~15% per year.
MultiThreaded performance can increase as much as you want to pay for extra cores.
I love that you take the hype cycle chart as fact, based on what data did people determine that AI's hype and decline looks like that. What data points were used to construct the chart at 15:46? Looks like its made up.
The entire history of human technological advancement has been filled with plateaus and eventual moments of overcoming them.
Sure, but the question is, how long will the plateau last? A few years? Or are we talking about decades before we get another breakthrough?
@@Tom-rt2 I'd give elementary level rudimentary agi 5-10 years to emerge, with notable marked improvement every 5 to 10 years thereafter.
@@Tom-rt2 Even at the current plateau, AI renders vast business practices obsolete. We haven't even begun adapting it. I just went to the DMV to renew my license and was given paper form to fill out.
@@Tom-rt2 There's no current plateau in my opinion. We're FAR from one. Companies are releasing smaller and smaller improvements to their models instead of waiting many months to release big ones. We didn't get GPT 1, 2, 3, 4 one after another, there was significant time between their releases, and this time Sam himself said they'll instead release smaller, more frequent models, hence they're called 4turbo, 4o, etc.
If the strawberry leaks are to be believed, GPT5 that they have internally is insanely powerful, it has a much bigger jump than 3 to 4. Except they can't figure out safety, so they won't release it any time soon.
If for any reason they're going to slow down, isn't because of a plateau in performance, but a plateau in safety.
Yeah this channel and the commentators here are just coping because they will only be able to date AI gfs in 30 years.
I refurbished my PC around 6 years ago with a 1080 NVIDIA graphics card. I have yet to need to upgrade it for any reason. Not remotely surprised how quickly AI is plateauing given how much money is being poured into its development.
“Single threaded” was never a qualifier for moores law
15:50 Hype Cycles and how it connects to the stock, forex, crypto market.
i get your point, and you are correct. LLM's might hit a plato however, this does not mean it will stop the development of ai products. The product innovation and foundation models do not move at the same speed.
The graph you are showing is for a single-threaded cpu, but we actually started to have multiple cpus working in parallel so the moores law kinda still works
So many bad takes and half-knowledge.
Every fucking Technology that uses a computer chip, evolves over many models in a S-curve, and every fucking time we get to the and of one S-curve we have already a new tech that performs worse but than exceeds the old s-curve.. this happen over the last 40 years over thousand times, and this will not stop.
this has nothing to do with economics, nothing with bad software or even with society, technology is at a point where we see every 4-6 month a new "thing" that is in its low 10% performance but beats the old "thing" that is already in its 90% performance. Every fucking time. Only Wars,Diseases and market crashes stop this for a fev months.
Aha. Until the next leap forwards. I don’t get how people don’t get that innovation comes in leaps and then increments that optimize the leap, then the next leap etc. With some leaps actually being dead ends long-term. We are so used to the fast moving increments we miss that leaps are happening all the time but take time to become integrated and then incremented upon.
AI will not stop Improving.
But the models of computers themselves and the way scientists train and bulit AI .might have to significantly change
Imagine how good this video could have been if this guy had remembered the word 'asymptotic' from his college classes...
The platou in performance may have something to do with the benchmarks used in the graph, models are achieving 95%+ on MMLU and MATH so there isn't much room to see improvement. Hugging face revamped a bunch of the benchmarks for that reason
no its because they have less difference in compute, and you can scale llms infinitely btw theres no plateau according to transformer scaling laws
Moore's Law is not about it being 2 times "faster" or 2 times more dense, just that we can make working chips with double the amount of transistors (they can still be bigger, more cores and so on, the biggest chips now are usually GPU's). It still works that way although it's moore (pun) like every 2.5 years or so now, and instead of just making things smaller we do more layers and have better heat transfer.
And the LLM's were always at a plateau, it's just that we can put in words what the limitations are now. All improvements now is just adding abstractions on top that make better prompts for you, a kind of transpilation and db layer that's not directly related to LLM's at all.
Imagine rage baiting data scientists while wearing your mom's shirt.
Since you asked, A lot of it is the co-piloting that makes it so remarkable. Sure LLM's on their own won't replace 100% of jobs. However what we are learning is that "good enough" showed up last year and we are learning how to make better use of what we've got. As you mentioned the hardware will be a big part of it, but like you said we have diminishing returns there. What we are quite likely to see is an app or likely brand new kind of software that shifts gears through different kinds of AI.
When the first steam engines were pumping water out of coal mines, they had this perspective. We need to realize that we don't need more efficient engines initially, what we need to learn how to adapt what we've got in new ways. *That* is the way we'll find under-the-hood AGI.
I think you are confusing AI with LLMs. LLMs are a subfield of ML and ML is a subfield of AI. AI is a big field with many things in it.
yup that is infuriating bro he forgot all the other field that progress extremely fast like the image recognition or voice ect
Yeah thats excactly what I also thought
But the LLM paradigm is what is considered the future of AI.
Almost everything can be viewed the way LLMs do (states, next token prediction) and so all these other AI models would be obsolete if the hype about LLMs and transformers are real.
Was there any graph that showed how GenAI improved over used computing power to train models? I feel like this is the only thing we should look at if we were to believe GenAI can be improved without guiding it in one direction. There were benchmarks scores over release date but that shows like noting
I agree that raw LLM intelligence is slowing.
Claude Sonnet 3.5 's big leap for me is the artifact and project knowledge base features they built into the product... not the actual llm itself.
The LLM being trained to work smarter with tools is the next incremental improvement.
For example, can I fine tune llama 3.1 to work with a custom vscode extension that can search, replace and delete inside my codebase?
That would smart tool use. Not AGI. Anyone else thinking like this?
I trained one to predict commands in my command line based on my history.
I would never trust any cloud service with that data, I had to do it myself on my own hardware.
Claude 3.5 the model itself is also a big improvement, why? Cuz the small improvements it made over lets say gtp4, ppl were saying it would be very hard to and would take years do but they managed. But yeah artifacts is insane, read somewhere its a system prompt so that's even more insane.
no llms scale log(compute) linearly infinitely with no plateau, the reason it is slowing down is because there is less compute difference between model v2 and v3 eg
I think apart from microchip architecture and other technical issues the main wall AI is going to face will be energy consumption. At a certain point it will drain too much energy from other fields in society making people question if it is worth it.
Why do people keep saying AI isn't going to keep improving when everyone who stays current in the field sees the opposite? The last 8 months have been insane and it's getting even more insane over time.
because most benchmarks are flawed, and some pretend they aren't
if you give an llm double the compute its improvement is the same, it doesnt plateau.
A lot of people don't want AI to do good, because of X reasons. This means they are biased. Isn't console gaming still kinda popular? Why are there so many people saying it's better than PC gaming?
Those two completely different topics have the same answer: people want to feel like they made the right choice. Like if they took the right career, choose the right gaming platform, got the right job, etc.
Btw, if you are programmer, and you are reading this thinking that I'm talking about your job, because that's the popular thing to say: no, AI will not replace your job. I know it. You know it. Everyone knows it. Again, people LOVE to say that because they are biased. They love to believe that they were "saved" by dropping out of college, or that they avoided a huge waste of time by not studying, wanting their time partying or in videogames. Again, when people want to believe something, they will. Let them talk. Reality hits hard, and not only programming is among the safest career now that AI is a thing, but also, chances are they are doing one of those jobs that can easily get replaced by AI, actually do their job, and they aren't even paying attention to it.
Everyone in the field just wants to see the opposite. Obviously, they need to overhype the tech, otherwise investors just flee.
Because username : check
Great video. Makes sense. I would actually say that the amount of sense this video makes will double every year.
GPT-4o is obviously much better than GPT-4 turbo. And Claude sonnet 3.5 is better than GPT-4o. And the smaller models like Phi3, Gemma2 and Llama 3.1 are rivaling the much,much bigger legacy ones like GPT-4. As for the benchmarks you can't just look at the increment in absolute value. You have to consider that it gets harder and harder to increase: going from 96->97 will probably require the same amount of "improvement" as going from 70->85. And the benchmarks are also gameable. There are plenty of fine-tunes with significantly higher values than the original models without them seemingly being better in general.
The actual next step here will be similar to what happened in early 2015 with vision models when we realized most weights were useless and wasting compute, we'll optimize.
It's a very natural cycle in AI research that has been repeating forever now.
However, one thing that is super weird with the current scale of models we have is the emerging ability that some of the models seems to be exhibiting.
Like very very large models aren't behaving the same way as their "smaller" counterpart (who still have billions of parameters).
Some of the largest models out there behave like they are able to be meta-learner, even without changing their internal weights.[1]
What I think will happens now is we'll converge into a semi-universal way of encoding the data, which will force a change and optimization of architectures (which already is kinda happening across the different provider). If you look at the massive success that Meta had with Segment Anything family of model it's pretty clear that's the flow. [2]
These two force combined will give us a first truly multi-modal AI that will be less wasteful than current mega models, while keeping the interesting emerging abilities. That + more optimized compute will give way to the next wave of AI advancement for sure.
[1] research.google/blog/larger-language-models-do-in-context-learning-differently/
[2] ai.meta.com/sam2/
Very bold title coming from someone who has no experience in this field.
Yeah this guy’s a clown for this video. Too many mistakes, misunderstandings, and unwarranted assumptions to count.
Yes this youtuber is an absolute clueless clown
Ad hominem
@@freeottis It’s not an ad hominem. If anything it’s an appeal to authority, but OP wasn’t claiming that the guy is wrong simply because he doesn’t have experience. He was commenting on the extreme confidence this guy makes his claims with, when even experts in the field of AI with 20+ years of experience tend not to speak in such certainties when making predictions, because they’re educated enough to know that it’s a very difficult field to predict. Especially when it’s moving as rapidly as it is now.
@@therainman7777 why is it not an ad hominem?
I didn't understand 6:14
Llama was released with the model, and the weights, so I'm not sure how "it's not technically open source".
Perhaps I misunderstood though.
Meanwhile Groq is running faster than the latest Nvidia chips, all the while running on ancient 14nm printouts.
We still need Nvidia hardware for training thanks to CUDA, but that might change because of the ZLUDA project, allowing AMD GPUs to use CUDA code. But I only have surface level knowledge about all this, so I am not sure. And Groq's chips are specially designed to have very fast access to memory with faster bus or something, allowing faster processing.
Safe to say, investing in Nvidia right now will turn out to be profitable for some time to come. That is, if you believe that AI is just getting started, as I do. Everything I'm reading points to NVIDIA retaining their lead for 2 - 3 years at least. Fabs ain't easy to make.
I did not know that about AMD and Cuda though, gonna have to look that one up. That would be huge if true - and it would be great for everyone going forward, as I'm worried about this insane monopoly that NVIDIA could gain/already has. But even then, AMD is 10-15% of market share compared to 75%-85% with Nvidia. Gonna take quite a lot to significantly impact those numbers.
*let me summarize...*
1) language models that "train" using scraped data will massively slow down (overhyped and lack of new data to learn from), whereas
2) Machine Learning that involves setting up a problem to solve and having the computer learn to solve it will accelerate (i.e. NVidia DLSS upscaling... make low res image look like high-res image... and eventually make UNHEALTHY cell look like healthy cell. i.e. devise a treatment)
This video has aged like milk lol
new video title: "okay now i'm scared" referring to the new o1 modell
Came here from CNBC report on the same topic. Congratulations for being ahead of the curve.
CNBC is bs
🙂↕️
You are looking at a one-year window in AI abilities, and even if we disregard the fact that moving closer to a 100 percent correctness in tests will require an exponential effort, there is absolutely no grounds that support that LLMs won't change in structure and training material, and that the hardware will not move from GPUs to something more specialized and intelligent. You cant really compare LLMs to something with physical limitations like transistors/CPUs. It is a bit like taking a handful of neurons and mashing them all together expecting to get a brain out of it.
My hunch is that you could run super-human AI on already existing hardware, just by improving software architectures. The chance that hardware development will flatten at a level below what will be required for super-human AI is in my view zero, even if you take the most negative view on future hardware developments. Physics does ultimately set a limit on how much intelligence you can get for a certain amount of time and energy. Intelligence itself might have a plateau; after all, once you figure out that 2 + 2 = 4, it's hard to improve on that answer. I don't think diminishing progress in computer hardware will stop AI though.
comments are:
-criticizing the benchmarks
-attacking his credentials
-nitpicking semantics without arguing the point
-"wait for -...."
-accusing people of having no vision while providing no concrete examples of practical uses
-comparing their ai girlfriends to the invention of the wheel
They are quite emotional. As a layman, I expected to learn from them but all I got was people just being salty...
Interesting take, but I think the chart's a bit misleading. It cuts off before showing GPT-2 (Feb 2019) and GPT-3 (June 2020), where we saw gradual improvements before a big leap. The increased milestones on the right might look like diminishing returns, but they really reflect growing competition and innovation in the space. We’re seeing more players in the game and continuous advancements rather than a true plateau. The competition is pushing the boundaries, not signaling a slowdown.
What I can see from this problem is we are not able to scale up because of:
1. Hardware Limitations
2. Algorithm Limitations
3. Poor quality data.
So we cannot scale, thats it
We need a quantum supercomputer, and we need a new algorithm designed from the ground up for quantum computers. Research Orch OR theory by Sir Roger Penrose. If this theory is to be believed, then consciousness is because of quantum entanglement which means quantum mechanics plays a curious role. This would also mean all these benchmarks are futile since we can't even predict yet how exponentially superior quantum computers can potentially be, and by extension the AGI systems developed on quantum computers.
@@SahilP2648 Quantum computers aren't a new tech. Its like more that 12 years when it was successfully. Actually, quantum computers aren't traditional computers. They big, expensive and critical science project. I think quantum computers are really hard to work with. We all have to see what future is going to be like. LLM's are great but again they hitting their limitation not on the hardware end I would say (not at the current moment) but dealing with algorithm issue, we need to research more new algorithms. And we don't have quality data, new legislation are going to be another challenge.
I'm always amazed when someone brings up quantum computers, because quantum computers just solve 1 class of math problem we can't solve with traditional computers.
Pretty certain the GPUs we use for LLMs/AI, don't depend on this math problem class or aren't limited by this specific math problem class. What I do know the specific class of math problem, GPUs obviously are the most efficient for it (when not using quantum computing), because they are the best for math.
Their are obviously people working on trying to use quantum computing for AI, but that's a big research topic with no easy solution.
4. Inbreeding
I had this realization recently. I've always felt closely tied to technology, especially when you're constantly having dreams and visions about the "future". What's interesting, while playing Atari 2600 40+ years ago, I had a vision of a future computer/game console that was a crystal or diamond cube that had "Mind-Fi", a literal program that played in your mind that mimicked reality. No wires, plugs, controllers, or physical storage medium. Just a 3" clear cube lol
Cell phones are a great example of tech reaching a plateau. When was the last time we had anything but incremental improvements to our phones? AI is running out of information to train on.
Video game graphics also
And cars. More and more doo-dads, still a box on four wheels.
Ahhhh it is so validating to see o1 drop. You're all misunderstanding. AI progress is not about progress in one regime, it is a uniquely synergetic field. Looking at progress of one part of the whole does not give you an appropriate perspective on what's happening.
No shit a score out of 100 plateaus rapidly. You're completely wrong about OOM increases in computational load on these models, GPT4o was CHEAPER to train than GPT4, there was not a comparable OOM change as there was from 3.5 to 4. There are very simple reasons to argue that LLM performance might be plateauing and that it certainly will pretty soon, but this chart is not it.
The only reason we are reaching a plateau is because they're being optimized for a set of benchmarks. We need more benchmarks so the models can get good at everything
Technology is increasing exponentially. It will continue to do so without regard to feeble interpretations of Moore's Law.
Tell that to anyone working in mechanical engineering. Every technology hits a limit.
Automotive technology is a perfect example of something that has had minor improvements over a 100 years of development.
Cars are not getting substantially faster. If there's a Moore's law in automotive it's measured in centuries not years.
@@katethomas1519 Actually, I think saying that mechanical engineering has neared or reached its limit is off the mark. There are still so many advancements happening, like in compliant mechanisms, which could revolutionize things by making parts lighter and more efficient. And let's not forget about new alloys that are stronger and lighter than ever, or 3D printing, which is opening up possibilities for complex designs that were impossible before. Mechanical engineering is far from reaching its limits; if anything, we’re just scratching the surface of what's possible.
@@katethomas1519 Cars aren't getting a lot faster because there's no need for them to be any faster. But companies do see a need for AI to keep improving.
super narrow perspective.
CPUs single core performance has not been increasing rapidly, but energy efficiency/usage has gone down fully.
CPUs no longer are used to full capacity, and you can always parallelize to more coress.
When we got to 1nm, 0.5nm we will have fully efficient CPUs, that are becoming cheaper every year due to optimization in manufacturing and reduced marketing.
Then it will be all about stacking more cores, utilizing better cooling and probably switching to ARM, which can give huge boosts to compute.
Also most things are going to the GPU side of things, so we may soon see a shortage of need for CPUs.
Finally till we plato, which will happen in 5-6 years, quantum and AI will be the main pieces.
The modern computer will probably be out of usage in 10 years time tops. It's probably gonna be server based connection with a subscription, and you just having a screen, keyboard, mouse to access using satellite or 6G outside, or wifi through home.
Not to mention there is a finite amount of documented human knowledge and art to train AIs with. Increasing the capabilities of AIs will hit a wall very soon when all of Wikipedia etc has been used.
I'm pretty sure it's consumed all of Wikipedia already, or do you have a source to the contrary?
@@merefield2585 that's my point, all models probably already have all of Wikipedia and are stuck waiting for humans to make changes and add more pages. AIs can't really come up with their own ideas (and judge their value) yet so they are forced to wait on humans telling them what is correct and not (and we can't even agree most of the time lol)
@@KarlOlofsson thanks for clarifying. Have a good day 👍
@@KarlOlofsson We wouldn't want that tho. AI is a tool and we are the ones taking the decisions. Why would training with Wikipedia already be a problem? If you ask a human to learn something, and he reads all the possible knowledge about it, why would that be a problem?
@@waltercapa5265 because you can't call AI intelligent if it can't figure things out for itself, then it's just a glorified search engine or robot. I call them ML-bots. They have to become self learning to not be dependent on human bottlenecks but that will likely never happen. AIs will likely become very simple assistans but nothing more.
They are currently teaching it how to generate new data, meaning the ceiling will be removed. You could not be more wrong, lets see how this ages.
3:48 I got a secondhand M1 (with the minimum memory and storage) from FB marketplace for £200, not including shipping >:) It has a small aesthetic defect but works completely fine! I felt so proud of that deal.
Not bad but considering the recent need for memory, you would need to upgrade soon. Enjoy it while you can though.
I could be wrong, but I dont think apple had much to do with big little cpu architecture. ARM have had that in their designs for over a decade now.
Additonally, the idea of having specific chips that handle specific tasks has been in use by CPU manufacturers for a long time. For example Intel released their Quick Sync technology in 2011 to improve video related operations. So though Apple has this too, its not something Apple has made mainstream or had to "bet" on, its very proven and very common in all cpus from desktops to phones, they are just more abundant in ARM devices due to their reduced instruction set compared to x86
To quote a wise man: "The ability to speak does not make you intelligent". We've only learned the AI to speak not to recognize what to say
I think it does make someone intelligent, he was wrong
That makes it as intelligent as the average person.
It's like we've invented the speech part of a synthetic brain, but we still need to invent the intelligence part.
You talking about Theo or the AI?
@@luckylanno The problem solving aspect. Ability to figure out effective solutions to novel problems (Not seen anywhere before). I think a good place to start for this is the Millennium Prize Problems. AI is allowed to research, learn appropriate problem-solving techniques on its own, but it must solve the problem at hand.
“AI isn't gonna keep improving” he talks about AI reaching a plateau. That makes no sense in this context. It's funny to me that his own starting example proves why his statement is nonsense. AI labs aren't locked in to current designs. They already know its limitations well. And if they find out it won't lead to AGI they will switch directions without hesitation. They Don’t care about trends or predictions based on what already exists because they have always been at the limit of what is known to be possible. The top AI labs goal has always been to exceed what we know to be possible to achieve their goal.
We know AGI is physically possible. We know it's a thing that CAN be done. We have existing examples. But we don’t know HOW it can be done.
If we just cared about making one AGI we would just have a kid. Or adopt a pet. We already have intelligent systems we can raise and use.
The quest he asks that makes me think he's dumb is if we can improve. I can bet my life on “yes” without hesitation. The goal is achievable. The question in the air that he should have been asking is can we achieve the goal in a timely and cost effective manner. That is what the problem is.
In real life right now, if labs hit a roadblock it's their job to keep going. And that's how we make progress.
“Oh no we hit a roadblock so I guess we should just give up.” Said no one ever. They will keep going until they get to their goal or run out of money trying.
The reason computers keep getting faster isn't because of some natural force. It got faster because people made it faster. People for various reasons wanted computers to be better at computing, and they have continued to make the breakthroughs necessary to complete that goal. The reason computer architecture continues to change is because the architecture doesn't matter, the goal does. If current computer architecture does not fit our goals then we change it. If current AI systems don’t work, we will simply change it. It wont plateau not because of the existing technology but because of the effort of those working towards their goal.
He's obviously talking about LLMs specifically, and he's correct. LLM and AI have erroneously become synonymous at this point
nah bruh your nvda stock is gon go down once this overrated hype is done it already is goin down
@@jaehparrk I don't own any Nvidia stock btw.
That's a falacy. Any Computer Scientist who really studied Operating Systems and software evolution understand that we're far off from PROVING we've reached a plateau.
My point is - there is absolutely NOTHING that indicates that we've reached a Plateau in software intelligence development.
We're just seeing it narrow. The LLM's models, which are purely based on words (strings) might have reached a plateau. Specifically because the theoretical model - transformers - might have reached a plateau. But there are tons of other Machine Learning in the works, not only in language processing but in robotics as well.
It's just too silly to think that AI has reached any kind of plateau. We just got started.
The reason for the AI plateau is how bad the major AI Developers are.
For example, when it comes to pre-trained Transformer LLMS, there haven't been any real, major advancements since GPT-2. They just keep throwing more resources, and additional side gimmicks, at the existing concept.
These create an illusion of progress, but there is no new technology or theory involved.
Open AI is the worst thing in machine learning technology, but unfortunately is also the richest, because they've sold out everything they used to pretend to stand for. They are secretive, proprietary, and lazy. They don't produce new ideas or technology, they just keep shoveling money at old technologies.
"For example, when it comes to pre-trained Transformer LLMS, there haven't been any real, major advancements since GPT-2. They just keep throwing more resources, and additional side gimmicks, at the existing concept."
That just isn't true.
RAG, for example, is a significant innovation, not a gimmick. The capacity scale context windows of LMs from 512/1024 (SOTA GPT-2 at the time) to 4096 on every LM and 1M+ on new big ones is a major innovation (likely involving transformer-alternative architectures). Multimodal LMs are an incredible advancement. And last but certainly not least, the ability to make LMs 500-1000x the size of the largest GPT-2 version, that took innovation in compute resources and model training efficiency. That was all in 5 years. There's no illusion in that progress.
@@dg223-p1r Wow, they are SO corrupt that they just deleted my rebuttal, which is all facts about how every single part of what you described is nothing but a plugin or expanded resources on what is, essentially, GPT 2.
They are criminals.
@@dg223-p1r Wow, the criminals keep deleting my entirely-EULA-compliant response.
Everything you list is exactly the kind of gimmicks I am talking about.
None of them are an advancement of the model.
@@KAZVorpal I'm curious what innovation you feel took place in 2019 with GPT-2. The transformer came out 14 months earlier, Dec. 2017. What was so great about GPT-2 that it counts as an innovation on an architecture but nothing since does?
@@dg223-p1r You've got me there. GPT 2 was the first implementation that worked well, but that's more a Turing Test thing than an innovation in architecture. I was aware of this, but trying to start with the one that fundamentally behaved like the modern version.
So, really, OpenAI has created nothing at all, beyond being the most successful implementation of Vaswani's Attention is All You Need.
But, again, I felt that is a separate discussion, what I was focusing on is GPT 2 being what feels like the same thing as the modern product.
Either will videos that become archaic in 21 days about AI
Aged poorly
how so?
There are still huge optimizations to be made for other languages. The biggest improvements may not be in performance, but rather in model speciality.
Autonomous cars are NOT far from functioning, they are driving people around San Francisco. They are here now, hence they are on their way up the second bump in the hype cycle (the realisation phase)
Do they work well with Inclement weather yet?
@@iso_2013 Not yet but they will. Give it a decade or so.
@@AFellowCyberman A decade or so is not "here now"
They got forward training based optical modules now. That goes up in energy efficiency and cores speed for a while and multiple colours going through the same circuit.
To train something with a trillion parameters a few exa flops will do but if you want to train a quadrillion parameter model you need 1000 yota flops.
saving this video in the "aged like milk folder" to look at 5 years from now.
Gonna save your comment to laugh at it when AI hit the plateau when they don't have any training data left
@@realkyunu yeah sure buddy "cars are never gonna replace horses."
Cope.
@Marduk401 LLM are improved with high-quality data. What do you think happens when they run out of high-quality data, huh? Feeding LLMs their own output doesn't really work.
@@realkyunu there are a bunch of ways to solve this, and the training data limit is unlikely to be reached ever. Do you understand how big we are talking about?
It's like saying you'll drink the ocean and leave earth to dry.
But instead of getting technical I'll tell you this again. You think that the horse will never get replaced by the car judging from a technology that has been popular for the last 5 years?
Yes people said the same things about the car, the tv, the freezer and every technology that ever existed.
Give it 5 more years at best.
The fact that you think this is gonna somehow stop because of an early suspected limitation like training data ending (so vast that it's currently impossible)
Ask yourself this. when in the history of mankind has anything like that just "stopped"?
@Marduk401 "Training data limit is unlikely to be reached ever." When it comes to totally useless data, you are absolutely right. That trash is produced on an incredible rate on the internet. But humans cannot produce enough >high-quality< data to be consumed by the LLMs in order to keep up that rapid growth. ChatGPT had a lot of air up at the beginning, but this will end sooner or later. Maybe not in this or the next year, but after that? But I am curious: How are they gonna train the LLMs after the "end" of tons of high-quality training data? If they have a method to endlessly train their "AI" otherwise, I am totally with you. If they don't have a method to train their LLMs infinitely without using high-quality training data, then those things will replace shit.
Dude, when someone completely develops a "video to video" AI that would be PLENTY to change the entertainment industry for decades to come!
All AI art efforts should focus exclusively on that. It's been years since Ebsynth started to develop that.
Yeeee, i love it when random swe make videos about AI. Unless you are a data scientist, ml engineer, or researcher, your opinions mean less than nothing
Lmao
One of the biggest things that will change how fast things can be is the semiconductor material. Everyone keeps trying to make silicon faster, but we've already found better options.
Honestly, this would be an absolutely wonderful place for AI to stop
I agree hahaha
💯
Yeah, stops before it takes over our world😂
You mean unreliable bs of tech which should’ve transformed how every single job is done? Definitely nice place.