That's Apple's pricing structure, they want to look competitive with the pricing with the "base" model while charging some insane premium for the usable ones people would actually buy.
@@Artanis5 There's some technical truth to that, at least on iOS and iPadOS devices - they do transparent compression on memory pages not in active use. It's nowhere near 2x though, but using 3GB on an iPhone where 4GB would be necessary on Android is tenable. Apple is largely a marketing company, but this is one of the few examples of good engineering, partially enabled by the fact they've got a higher degree of integration from hw to sw. With that said, if I were to buy an iDevice for a premium price, I'd want the full 4GB or 8GB or whatever and for this tech to be used to increase multi-tasking performance. LPDDR is sufficiently low power that the battery gains from cut down memory are not as useful and it's not as if they're passing the relatively minuscule cost savings on to the customer.
People: we want better battery life in laptops. Microsoft: we will use NPU to do tons of work in the background. People: disable it to get better battery life.
Much more vocal is the sort of person who solves one side of a Rubik's Cube and gives up. LTT just showcased the first OLED TV, it was not preceding a world where every monitor is now travel-sized. After sufficient adoption of NPUs they would eventually be many times more efficeint at tasks and save battery more than conventional CPUs ever could. But it's a question which is better, integrated NPU in the CPU when we look at the comparison with GPUs.
@@sboinkthelegday3892 I just want a well supported dedicated card for the NPU that I can plug in via something like PCIE. Doesn't need to be on the CPU or GPU imo, though a basic one on at least one of those may make some sense for lightweight tasks, similar to graphics modules on CPUs today I think.
That's not what a NPU is for. Just because Microsoft requires one for CoPilot Plus's REcall Feature does not mean the NPU is spying on you, the software is.
@@sem_skywalkeryour Nvidia GPU is made to do parallel processing, which just happens to be very efficient for graphics which is why the drivers render graphics through it So yeah, the drivers tell the processor to make graphics
@@sem_skywalkeralso an npu is just very efficient at some operations that most ai models use a lot during inference It's doing everything except for spying on you, especially since the npu being in your laptop means the processing can happen locally without sending your screen to Microsoft
@@ahmedal-hijazi3618If it falls hard enough, the GPU market could open up massively for new competitors. Really unfortunate that for professional work its Nvidia or nothing.
fuck yes I'm so tired of hearing about AI every time I open any tech media! This has happened with the Big Data fad, VR, and other tech trends. But so far this is the most obnoxious one because it is literally everywehre even your grandma is talking about AI for some reason! They are playing us like a damn fiddle
@@gabrielandy9272 Are going to? Are going to? Have been doing for about a decade (what did you think RTX was?). NPUs are basically tesla cores that run on an apu instead of a massive gpu. Microsoft and apple pushed for integrated npus so laptop and mobile users could have access to the tech that made Nvidia trillions.
Unfortunately this is exactly what happens when there is no adequate competition. Let's remember Intel's monopoly when AMD were one step from bankruptcy. Every single new generation of i-series processes had +5% improved performance at best. Yet still, Intel thought that it is necessary to have a new chipset for almost every single new CPU generation. And in the the end they even started using regular thermal paste INSIDE of their CPUs. If AMD were able to return from dead and become a new leader in CPU segment, Microsoft has all chances to do the same, if they'll change their management which only wants to spy more on their customers and include as many ads as possible into already non-free OS. Corporations, gaming and familiarity - three main reasons why people still use Windows.
To be fair, as someone who hates apple's BS, Apple's software is very efficient, so it can be argued that the base non-pro models can get away much more effectively than on windows (esp after manufacturer crapware is factored in), even though I don't know what swapfile/pagefile on MacOS is.
"ai" (really just glorified autocomplete) got so much worse the last 8 months its actually hillarious. but its still apparently needed in every device they could possibly stuff it in because... money
i can't believe you're not ok with having something screenshooting everything you do on your PC, and let's face reality, upload it or some summary version of it, to be sold to some company in a "do you agree with the new terms of service" without a opt out option. You're being unreasonable man /s
@@Jeffcrocodile yeeah.. when i read about that windows update with this new feature, that was the day i switched to linux. i still have windows on a seperate harddrive but.. i find myself using it less and less. im sick and tired of this (to quote luis rossman here:) "rapist mentality" of corporations becoming more and more apparent each day... i want my dumb things back. 99% of things dont need to connect to the internet imho.
I get where you are coming from, but that's a brain-dead take. Computer hardware is driven by software which can utilize it, the os platform is generally irrelevant (excluding 1st-party software which are tied to the os). Just as an example, lots of games use DLSS upscaling which uses Nvidia Tensor cores (basically an NPU), got nothing to do with Mac OS, Linux or even Windows for that matter.
@@beckergreyso apparently, you don't need it until a lot of software uses it. And until something in Linux actually needs it (not gonna happen in next 5-10 years), tech can be ignored. U actually can't name 5 programs that actually require this thingie, and can't be stupidly emulated by OpenCL or Cuda
This is the best explanation of NPU so far! Other people that have talked about it are either only talking from political/emotional standpoint or from the money aspect. No one has ever discussed it in a practical application angle like you did. Great work, man!
To give some clarity and correct some misunderstandings: NPUs are way cheaper than GPUs. NPUs use a lot less power and are more efficient than GPUs. The total of parameters are cross all layers. To do a neural network (NN) calculation you’ll only need the previous output and the current layer. This is why NPUs don’t need excessive RAM. Specifics: Face recognition on an image only needs a model that is less than 500MB in size and a NPU of about 1TOP. Small Language Models (SLMs) need about 2-6TOPs to have a reasonable response time. Real-time object detection needs about 10TOPs. This is all running under 2W. A 13TOP NPU is about $70 USD.
All depends on precision. What quality and to what precision is the facial recognition? The rest of your comment, you could talk in terms of units/stats/specs, instead of arbitrary terms such as large/small language models. What's large?
@@gilangwahyu4450 Precision of your models. The more decimal points allowed, the more specific your parameters can be, allowing for higher precision on your results. int8, only allows whole 8-bit numbers. That is -128 to 127, since the first bit is reserved to state whether it's positive or negative. fp16 allows 16-bit, floating point numbers. So you can have decimals, hence more precision.
@@Potato_Quality7 Idk about in practice, but in my masters, theoretical NPUs don't have precision the same way GPU chips have. It doesn't use binary or transistors. It is extremely basic multiplication/addition gates, not on/off logic gates. It isn't digital. Because it is the most simple signal additions, it uses the minimum amount of power and has extremely high performance. The trick is to convert the digital signals to analogue for the chip and back to digital.
@@HeyJD123 Yeah, that makes sense. The point there is to spend as little energy as possible while maintaining feasible results. Moreso for a hobbyist. I still am not too informed about the whole thing, but for now, it seems to be more of a gimmick, since what it can do is very specific.
@@Theausomecaleb use git or subversion some kind of version control on that file and then any version you want to revert to can be done. Though there is a bit more technical know how that goes into that and getting all your coworkers to learn that is a uphill battle. Plus source code version control is really good at text files and if you are working with images or binary file types you will probably going to need an extensions for your chosen version control solution that works with the file formats. Otherwise your version control will get slow and bloated keeping every binary version in it's historical archive.
@@Iswimandrun It would be nice if github had a better GUI, it is one of the hardest to use and most annoying pieces of software I use, and there is no reason it needs to be such a pain to use.
I'm impressed that you explained a simple neural network correctly. Not a lot of people actually understand it. There's of course a lot more to it like back propagation and quantisation.
Frankly I don’t want it if it means I can avoid AI features. Unfortunately it looks like the options are “run the AI locally” or “give us your data and run it in the cloud”. I don’t want either. Absolutely hate how AI is being forced into everything now.
I don't know if it's been forced as much as it's supply and demand .. also imagine making an non NPU device and an NPU device of same product - why would anyone keep inventory of non NPU products which almost no one will buy
It's not forced. You are given the chance to use it. It's not like they make you log in to your devices via a chat bot and send email and IMs and everything else. They're making it easy for people who do want it to use it.
now? OCR is AI, google pixels cameras have always been terrible but theyve taken the best photos because of their AI systems, weve got image identification, facial recognition, fingerprint recognition, machine learning, etc. its been coming for a very long time. generative AI is simply the most recent step.
It's been there all along, the only difference is that the it got enough public attention that the focus of marketing shifted towards those features. You probably use a ton of ANN based features without knowing, and have been for years...
@@LeonardTavast That was my point. Apple's stinginess with RAM was all fun and games until they wanted to catch up with AI, and now looks very short-sighted.
@@axi0matic catch up? i seem to remember the neural processor being added to iphones way before there was a real use case for them. oh wait, the measure app! cant think of a more useful use of AI than the amazing measure app.
Curious note: in the early days of NPU in phones, they were less powerful and power efficient than GPUs. Huawei, Samsung, and Qualcomm relied a lot on their GPU to do AI work. Google with Visual Core started to deviate and then with Tensor, few months after Tensor Snapdragon 8 Gen 1 was the first Qualcomm chip with a really powerful NPU.
They were used mostly for that "ok, google" voice recognition back in the day, not this "LLM" bullcrap, it was supposed to be less powerful, but it was actually more power efficient because it wasn't to be used for generative AI which runs it for a longer time.
Dude i must say this is the best video that i have come across that explains the need and differences of various components (such as CPU and GPU) in simple terms.
One way I could be more interested is if the NPU had a little more data types, and would be good for other DSP-style calculations, like impulse responses for audio.
@@shadamethyst1258 An NPU would be way faster than a GPU for convolution while drawing a lot less power. Still, audio is CPU bottlenecked because the processing is serial, not parallel. GPUs are nice for 3D rendering graphics and vector calculations, not designed for AI/machine learning. At least an NPU would be extremely helpful for stem separation, my i7-12700K alone takes like 20 mins to process stuff, i watched the latest Apple M chip doing the same thing and it takes seconds to finish the processing because of the NPU. CPU wise, my 12700K is equivalent to an M2 Pro and slightly more powerful.
The resson you want a dedicated NPU instead of just a GPU. Is because it leaves the GPU free to do its own well optimized Specialized functions. It's the same logic that got us to move on from just having a single Processing Unite
I fear that our devices will become a grid computer for manufacturers against the will of the customer. Sure, it would be efficient and beneficial, but FOR WHOM? A capitalist would just love to abuse our private processors and electricity to benefit their own businesses: what an insideous way is it to disguise corporate carbon emissions by pushing them silently onto the users of their proprietary hardware and software.
@@Kikikan then put the npu on the camera module itself. YT subtitles work don't take much compute, and could be calculated when the video is uploaded to YT itself instead of recalculating over and over again. NPUs just don't help average people much right now, they are too narrow in scope with too few uses. GPUs got made in response to demand, NPUs were made to try to create demand.
As a computer scientist that works with this stuff, yes it is very much needed for AI. The GPU is like a more generalized version of this, with the exception of nvidia Gpus that are built with this extra task. It works, and yes a powerful GPU can do the same… but you could scale a NPU to the scale of a GPU and get the same if not more performance with 10x less power. It would also free up the GPU development to only focus on graphics, and offload neural stuff to the NPU, similar to when GPUs became popular. Meaning GPUs would also get a significant speed up for graphical rendering in the future when paired with a NPU
I would be interested if the NPU can run larger open LLMs utilising RAM which is cheap compared to trying to increase VRAM. Unless you're willing to give Nvida the price of a car and your kidney and maybe your first born child then 24gb and maybe 28 if they bump the 5090 a little. Ultimately I'd love to see expandable VRAM like we had back in the 90's.
The problem with this is that, with time things became more and more reliant on memory placement rather than memory space. That is, nowadays, it's not just whether or not you have the memory somewhere you can find it to process it, it's also whether or not it is in shared memory, local memory, managed memory, mapped memory, etc. Those things all change performance by several orders of magnitude, and so while you could technically run a model using normal RAM, it would still be extremely slow due to bandwidth issues because you'd have to be copying memory all over the place.
@@himalayo As a layman i dont really understand your comment, how does using RAM create bandwidth issues? I always thought RAM has the fastest/broadest interface.
@@flowgangsemaudamartoz7062 put simply: DDR5 ram gets you like 50gb/sec of bandwidth, which sounds like a lot until you realize a 30$ ebay gpu designed ***13 years ago*** can do 128gb/sec, and that is still considered uselessly slow. -=-=-=- put less simply: LLMs need to rerun the whole program for each token (which is 4 characters), and when a flagship LLM is about 300gb of code (after being compressed to 6bit from 16bit), even if you could fit it into regular DDR5 ram, this whole message of mine is equal to at best 25 fucking minutes of processing time. thats not to say that dramatically smaller models dont exist, you can absolutely find a decent LLM thats under 20gb, but processing on a cpu or npu will definitely be a third class citizen type of thing. we need not just massively but exponentially more bandwidth, in order to make running LLMs on a cpu's DDR memory instead of a gpu's GDDR memory an actual thing. hope this cleared things up for you. personally i very much look forward for when LLMs get good enough that we can add them to online dungeons and dragons... though i think the world will find more value in the hardware we invent to run LLMs rather then the LLMs themself.
@@flowgangsemaudamartoz7062I suppose that RAM is just for the space not the actual processing so the RAM and GPU (NPU) would have to constantly be talking to each other and that’s where the bandwidth comes in. I think that’s also why the X3D-chips perform so good in games because you have the cache directly on the chip and are not restricted by RAM (which is relatively slow when compared to on-die-memory)
Working with Embedded Devices in building automation equipped with (micro) NPUs. Targeting Video and Audio based person localization with a low heat and power footprint at low costs :) The video breaks down perfectly the most important parts of NPU computation with well created graphics. Thumbs up 👍🏽
As a programmer, i can think of a handful of non-AI/DL/ML workflows that could see significantly improved performance esp. in 3D processing and post-processing Simply put, anything that processes matrixes in both parallel and branching patterns. Some of this can be simplified to vector SIMD operations, but that's not a 1:1 replacement for parallel matrix operations. So-called Tensor cores are physically designed specifically for matrix operations.
Man, I studied Bionics and Neural Networks back in 1988, at the TU Berlin (Prof. Ingo Rechenberg). That had been much too early. I never have been able to monetize that fundamental knowledge for me. My Diploma Thesis was dealing with engines of actual electrical cars back in 1993, running for about 300km with a high temperature NaS battery. I can say, those Neural Networks do work exactly in that specific way.
Yes, NPU are useful, and it doesn't have to be for "LLMs" or image generators only. It's for stuff like upscaling, OCR, voice-to-text, image object recognization, etc.
I think this is a good thing tbh. Regardless of whether AI is a bubble or not, having hardware that can massively parallelise workloads like the ones demanded by AI is never a downside. It's the same way how GPUs pivoted from graphics to massively parallelised computations. Increasing the chips' raw compute power can only take us so far, at some point we have to bite the bullet and come up with smarter designed chips
battery>storage>performance. Thats what I want, in order. Performance is more than good empugh for amything im going to use (until new games assume I have newer hardware and refuse to work on somethong a year old) so im completely happy to stagnate it to improve battery, and bter storage, denser and faster access would be great
They're designed to monitor everything you say, read, and type. There's no way this ends up going badly.. It will likely only be niche manufacturers who will offer laptops without a neural engine for privacy reasons. After Microsoft's debacle with Copilot, there's no reason to think they won't be pumping this in with the usage data sent back to them.
facts, from their perspective the ROI on data has no real cap so any extra bit they can get (however ethical) can be monetized in some way with near infinite return potential
One in a million people care about privacy, if that. Think how many people use Android and iPhone, almost all of them. I only know one person who cares enough about privacy not to use them. And he still uses Windows... Recall will happen, it has such huge advantages for consumers that the privacy loss won't even factor in to their decision. Soon it won't be on-PC, we will accept that everything gets sent to the cloud because that will make it even more powerful. Eventually we will even get fully homomorphic encryption for NPUs too.
@@jgnhrw One in a million?! A much bigger percentage than that is actually willing to take steps to protect their privacy, such as 17% of people using tracking-free search engines, 31% of people having used VPNs, 15% using encrypted email services, etc. To me this suggests that at least about 15% of poeple care enough about their privacy to be willing to compromise some amount of convenience to protect it, about 1 in 7. Additionally, more than half of people voice concerns about how their data is used instead of simply not giving a shit, but most do not take any steps to prevent this.
@@jgnhrw most people care about data privacy, they just dont understand how it works. Everyone fucking hates personalized ads, the popular consensus IS that its a violation of your privacy. People inherently do not trust these companies. They don't want to be their profit.
@@VictorBash-h3s NPU takes up space that can otherwise be "the type of chip we actually want and pay for in a cpu", such as more cores/cache. And if you watch the video, some cpus have even larger npus. My computers always have powerful GPU so I have no need for NPU and I don't want one, and I don't want a cpu designed with drawbacks necessary to accommodate one. What's more is I don't trust what companies will program NPUs to be used for.
So, chip makers rushed to make a worse version of a GPU, which most people already have, just to say it's an "AI laptop/PC" now? This is absolutely happening because computer sales slowed down after the pandemic. There is zero consumer demand for anything going on here.
Bro, this is better than Nvidia having unquestioned Monopoly on ai. At least with these NPUs you are able to use ddr system RAM which is much cheaper than VRAM specially if it's from Nvidia who intentionally increases the cost of higher VRAM models in unproportionate to VRAM prices and also cutting the VRAM amounts in gaming GPUs so it won't self-compete.
@@jdargui1537 This would just be a waste of silicon. Light graphical loads like that dont need to be moved to a different chip, your GPU can do it with like 1% of its processors...
I've been one of those people that always upgrades to the new device. A.I. is the push I needed to break that cycle. I don't want A.I. on everything I own. I don't want devices that have an intelligent spy built in. Thank you, A.I., you are set to save me a lot of money in the near future.
we don't need NPU. the GPU is not used during most operation, while not as effective as purpose build NPU, it is powerful enough to deal with most of the workload. so what's the point? only mobile devices really want them because of energy reason which most people don't care about on their workstation.
@@VictorBash-h3s So you are going around parroting a contrarian take here? What happened with siri, amazon alexa and google assistant? its been practically a decade and nobody cares about them even though they keep getting forced onto users. AI is literally the same thing and NOBODY wants it. Look around and stop fooling yourself.
This is a really fantastic and educational video. The way you presented a basic neural network was genius. Explaining complex subjects in simple terms is a beautiful thing. Kudos to you. 🎉😊
ngl NPU will increase efficiencies but there will be a workaround for disabling the telemetry/data collection without affecting the program or make them less intrusive. But the NPU may remain idle if there's nothing using it
The simple answer to "disable data collection" would be just to not use windows. Depending on what you need the computer for, Linux would be the answer .
I think you missed the main point of NPUs by saying they are a "minor part" of the die. Thats the appeal! The fact that you can have substantial ai workloads using only a few watts via a tiny component on the cpu is huge. No need for the extra price and power of a dedicated graphics card which pulls 100x more power and also for dedicated gpus require y extra hardware attached to the computer
I think this equation will change as they get larger, though. I wouldn't want to shell out $1000 for a phone or $2000 for a laptop with the puny NPU's they currently have if I could avoid it.
Until you remember that memory capacity and bandwith are a thing, meaning that the NPU is only useful for small things like voice assitants and useless for anything interesting.
8:13 TLDR - Models that can run on your phone are small enough to be trained on a capable home computer, not a requirement that it's trained in a server center So this video is a great introduction, but 8:13 - to give a high-level gist of it, training could be done on any device, but several multiples of the size of your model needs to fit into the device's RAM, and training speed is limited by that and your processing power. The same is for inference / run time, but you only need basically a few multiples of the size of your model in RAM. So large models that needed to be trained in server centers are being run in server centers (you send your input over the internet, then it's computed on their end, then sent back to you), or medium sized ones may have been trained in a server center can run on your computer. Models that can run on your phone are small enough to be trained on a capable home computer EDIT: holy shit I continued watching and this is more that a great intro, this is a great overview of the current space
It depends, if the NPU uses system RAM on a X86 system it would be practical. The issue would be memory bandwith, so we will get PCIe NPU cards in the future with dedicated memory (NRAM?).
Ironically on the mobile devices there's more RAM because the NPU/GPU/CPU has a unified memory model. So if you have a mobile with 16GB of RAM, it technically has more RAM for the NPU than most of the mid-level GPUs on the PC, Nvidia is profiting a lot from just selling memory chips overpriced, because you can't just put more RAM in the GPU like you can on system RAM. But when it comes to the HPC GPUs for Datacenters, then they have removable memory. And before anyone says GDDR is faster, it is not, GDDR4 has the same performance than DDR5, the only difference is that the CPU access the memory at 512 bits (yes, a 64bit CPU access memory at 512 bits, what do you think "DDR" means, nowadays it transfer 4 bits per clock, and with dual channel, that gives you 512 bits ) and the GPU access it as 4096 bits. DDR is a bit faster in random access, and GDDR is a bit faster on sequential bursts, but you can get system memory that has 12-12-12-12 and it'll be faster than GDDR, GDDR is mostly marketing.
Being able to run AI locally can become a really, really big deal on the long term. With a local processing unit, we may be able to choose which AI model we run; We're not dependent on the whims of big IT companies; We can observe how google search isn't as good now as it was in the past. When cloud served AI start to loose it's neutrality because of financial incentives we'll see how much damage it can cause. It may sound silly now, I believe strongly that in the long term, personal NPUs is actually a very positive and important thing for democracy.
Cool! I don't know why I haven't seen your channel before. I appreciate the good animations and graphical explanations and the work that goes into a video like this. So many 'Tubers are using meme after stock clip after another meme... Also, thanks for demystifying the NPU. I was just thinking it may be just a word. Comparing it to the other dedicated accelerators is super meaningful. ...Subscribed ❤
NPUs are just specialized units made for s single task: computing multiplication of matrix with very large sizes, by using many parallel multiplications and a final layer of cumulative additions; the whole being surrounded on input and output by "samplers", i.e. functions that adapt/amplify the signals when they are not lienar by nature (this layer can also use matrix multiplication, or specificat operations like "capping" or sigmoid nonlinear conversion for smooth transitions); Then you need being able to schedule all this task efficiently (when the matrix is too large to fit in NPU registers and the number of multipliers and adders, you need to split the matrix to do that *sequentially*). However there are ways to significantly reduce the amount of calculation needed, notably when matrix are sparses or contain many partial replications: All the optimizations are in finding the best way to split these matrixes into submatrix and eliminate those submatrixes that won't contribute to the final additive result: for that you have not only "trained" the matrix, but also designed some threshold that will allow making the maxtric more sparse. As well the hardware acceleration can help automate the discovery of replicated submatrixes, so that you need to compute one of them and not all, and cache the result or reuse it in the scheduling of the pipeline. Once you've understood that: the NPU is not just for "AI", it has many possible uses in various simulations, and for processing massive amount of datas. When you create a AI model (by learning process) what you create is a set of constant parameters that feed the matrixes. Then you can put mulitple layers of matrices used exactly the same way to create other effects (including retroaction for AI using passive learning and adaptation). The difference with GPUs is that NPUs are more general and can use different trained data models, whereas GPU do this for specialized tasks (with hardwired models and specific optimizations, allowing them to do computation on even massively more input parameters, but with little of no retroaction: the matrices they use are small in GPUs to compute their "shaders", computed as small programs running in parallel with the same kind of data, but this is the massive rate of inputs and outputs needed to produce high resolution image at high farme rates that changes radically things compared to NPU). A GPU may integrate a NPU unit however for solving some local problems, notably for denoising the results of raytracing using a trained perceptual model or for some effects like variable radiance or transparency, and variable diffusion depending on local conditions or on the actual gameplay or during transitions and important transform of the viewed scene). So do we need a NPU? Yes, because it is generalist (but not just for AI or graphics!). They will more efficiently do something that the hardwired models used in GPU or their small number of pgrammable units will not be able to do efficiently (because these programmable units support too many instructions that are difficult to synchronize, so they are limtied by the rate of their clock as their operations are still too much sequential, even if the instruction set is much simpelr than the one used by the CPU). The GPU and NPU however does not replace the CPU which is still there and needed for controling and coordinating all works to the effective demand and to user interactions and many unpredicatable events (that can't be preprogrammed or correctly predicted and for which there's no "early sensors"). The AI concept is not the problem we fear all. The problem is how the input data is collected to train the AI model, or as input to perform the AI computation to get responses, and then where the generated response goes (who will use it, for what, and for how much time). And what will be the impact of decisions made from this uncontroled output (most often invisible to the users that cannot decipher it without designing, using and controling their *own* AI system trained with their own goals): data collection by "big data" third parties (too often very hostile and using very anticompetitive behaviors or using all tricks to escape the legal consequences of their abusive action) is the major problem. In itself a NPU is harmless, but the wya they are being designed and implemented is that users have no control (NPU techniologies are being integrated with networking technologies: you cannot use them with connecting these *integrated NPUs* to these spies that will decide which trained data model the NPU will be allowed to use, and there are hidden models preimplemented in NPUs that are designed to spy you more efficiently).
I still use my first computer...purchased as an adult in 1982. It's no longer a daily driver (close), but it demonstrates my pre-connected tech headspace (300-1200 baud modems are just not comparable to 'connected' in the 21st Century) I don't think my generation is ever going to be acceptant of the surveillance functionality that has been an early use of this tech. (Microsoft's 'Recall' saw me move to Linux as my primary OS as I anticipated the direction Microsoft was heading and wanted no part. Also refuse to land-fill my older, but still absolutely functional hardware, but that's another tale - I've held onto 40 year old hardware after all.) It has been my observation that the under 40 crowd have fully accepted trading privacy for convenience, and this tech is truly for them. It just gives me the heebie-jeebies. From my OS watching my activities, AI narration in RUclips content, to my questioning my own eyes at every turn, I was dragged kicking and screaming into the 21st Century. I was raised in the Golden Era of Science Fiction. Feels like neural networks are a recipe for dystopia. Thank you for placing your sponsorship at the end of your video, I make it a point to always play videos to the end when this is done (I tend to skip 'in video' sponsorships). I don't speak for everyone, but I am far more willing to 'pay' when I've watched a video that I've enjoyed, rather than being interrupted. This is especially true when the content creator creates the ad. It has a more natural flow after the fact. Anyway, it's appreciated enough to be noted here.
NPU's are in the "Field of Dreams" stage, which is "if you build it they will come" state that you introduce the capacity first then you make or deploy the capability later. Every device will eventually become a neuron in the large AI botnet.
Did you consider that perhaps the NPU size (currently) has a pretty much fixed absolute size so that is why the proportion of NPU size / total processor size looks bigger on smaller chips. If that is the case this means that there is not really a priority for NPUs on smaller chips but rather that is just the required size for it to be functional.
To my knowledge there is no reason why the size would be fixed. Also, NPUs being in mobile devices since 2017 while they are only now making it to laptops and still being nowhere near PCs shows that they are very much tied to size
@@TechAltar Couldn't it be one of those cases where one doesn't want to design other versions because of the R&D cost that comes with it? Similar to how CPU cores vary in quantity, but (usually) not in design?
For at least 2 decades, NPU stood for Network Processor Unit, it was used to translate all the URL handles into the actual adress, it was a bit sloppy that Network gave way to Neural since they still exist. Tidbit, Network PUs also use an inteeresting type of DRAM called RLDRAM for Reduced Latency, these DRAMs have about 20* the throughput of regular DRAMs for hash computations. Its a real pity this type of DRAM never found a place in general computing, it could be the basis of outer most cache off chip and soldered RL DRAM with regular slower DRAM DIMMS.
Just a hunch, but I think they'll become essential for resource management and security. There's a lot of use for something that can have a "feeling" about misbehaving software, hardware, and security risks.
I have to say, apple has a good approach here with their unified memory. On a desktop pc you need high amounts of vram.. But nvidia only lets you buy 24gb on "normal" gpus. Meanwhile apple can put as much ram next to their soc's as they want (for 200$ per 4gb lmao), allowing them to load massive models.
I certainly don't need one right now and I can't see myself needing one any time soon. I'm also not getting anything with a new CPU any time soon because my devices still do everything I need them to do
Inference with low latency is the future model. Paper like LLMRoute explain how prompts can distribute the complexity between the SLM (local) and the foundation LLM (cloud) so you can have a conversation with your device.
I think it will be used for other neuron nets not just LLMs like image, audio, video editing or encryption, data compression, computer vision, face recognition, fingerprint readings, translations and I am sure some games would find a use for it.
Computer engineer here: I personally think the idea of an NPU is an excellent idea but probably not for the reasons you’re thinking initially. Of course AI is the hot new thing right now and of course it will more than likely get more integrated and be far less invasive. However, if they maintain this “NPU” as a mainstay in computer architecture, it could mean that programmers would have a whole new frontier outside of the GPU and CPU to run these Models which have already been taking up so much space on the system already! Edit: I just got to the part where he said the same thing 😅 I guess I should watch the entire video before commenting
So its basically a third GPU allocated for inference only. It actually makes sense, A100 is twice as fast and takes half as much power than an RTX 3090 (both released around same year) because its optimized for AI stuff.
0:40 nope - physically, like AppleNeural engine, they are just co-processor on steroids. Application-Specific-Integrated-Circut. Problem is business model - software and applications.
i9-13900HX has twice the battery life of a Ryzen 9 7000 series mobile while watching youtube videos (Jarrod's Tech). Some of those chips can look very competitive vs Apple's stuff, AMD also has nice chips.
Gaming PCs really need good NPUs. They are finally now releasing games with language model powered npcs who know everything that character should know and can talk about these topics, but they have 0 situational awareness. To make these npcs aware of where they are and what is happening around them would require more local processing to add this data to the language model.
As soon as I heard about Recall, I ran out and bought a new computer that DOES NOT have an NPU. It may be the last computer I ever buy. You are NOT taking screenshots of MY computer. Ever.
Gpus with things like Tensor cores are much better npus anyways. Npus are only more powerefficient. (Oh he said exactly that at the end of the video, I wrote it before I finished the video.)
"Gpus with things like Tensor cores are much better npus anyways" Because of the dedicated memory and the speeds. If we have dedicated NPU cards, GPUs would return to their roots (rendering 3D graphics as an example).
@@saricubra2867 Pretty much they already exist, just not for us normal people. Just look at how much shit tons of money Nvidia makes by selling their server AI accelerator cards.
@@geiers6013 Exactly, that stuff isn't available to consumers, definetly not fair. Maybe it will take time until they shrink those sillicon dies for a practical use.
There are available versions, not from Nvidia but other companies. These NPUs just don't have the power of industrial NPUs. On the other hand, we couldn't afford industrial NPUs even if Nvidia would sell them to us. A Coral USB Accelerator using a Google Edge TPU coprocessor is much more affordable. Costs just 60$.
@@geiers6013That's not true. Any Nvidia GeForce card has a processor for neural networks. Even my laptop has this included. But you can also buy dedicated processors designed for AI, like the Coral USB Accelerator which uses a Google Edge TPU coprocessor and costs just 60$. They just are less powerful. An industrial NPU is just out of range for private people because of the extreme costs of such devices .
I think one common misconception is that AI works similar to the human brain. In that we directly link thoughrs to external stimuli. Where the truth is that AI is just brute force in that it has to run through millions if not billions of calculations to arrive at an answer.
The first 500 people to use my link will get a 1 month free trial of Skillshare (sponsored): skl.sh/techaltar06241
useless
love the shirt :)
the real question is, do we really need to know this ??
or you are just making video for the sponsors or.... ?
"silicone valley snake oil such as blockchains"
I will take any unwanted snake oil crypo you are willing to throw away
the evil sponsorship has consumed you
Engineer: AI Needs a ton of ram.
Tim Cook: Great, let’s ship our MacBook *PRO* with 8Gb.
"And when people complain we'll say Apple's 8gb equals to 16gb of MS ram because magic ✨✨"
That's Apple's pricing structure, they want to look competitive with the pricing with the "base" model while charging some insane premium for the usable ones people would actually buy.
@@Artanis5 There's some technical truth to that, at least on iOS and iPadOS devices - they do transparent compression on memory pages not in active use. It's nowhere near 2x though, but using 3GB on an iPhone where 4GB would be necessary on Android is tenable. Apple is largely a marketing company, but this is one of the few examples of good engineering, partially enabled by the fact they've got a higher degree of integration from hw to sw. With that said, if I were to buy an iDevice for a premium price, I'd want the full 4GB or 8GB or whatever and for this tech to be used to increase multi-tasking performance. LPDDR is sufficiently low power that the battery gains from cut down memory are not as useful and it's not as if they're passing the relatively minuscule cost savings on to the customer.
that is decoy marketing strategy!
Very bad situation.
People: we want better battery life in laptops.
Microsoft: we will use NPU to do tons of work in the background.
People: disable it to get better battery life.
it's called over improving something... Windows 10 is good enough, so they made Windows 11, for microsoft, not it's users.
@@stachowiwindows was good enough 10 years ago. They should have stopped then
Much more vocal is the sort of person who solves one side of a Rubik's Cube and gives up.
LTT just showcased the first OLED TV, it was not preceding a world where every monitor is now travel-sized. After sufficient adoption of NPUs they would eventually be many times more efficeint at tasks and save battery more than conventional CPUs ever could.
But it's a question which is better, integrated NPU in the CPU when we look at the comparison with GPUs.
@@sboinkthelegday3892 I just want a well supported dedicated card for the NPU that I can plug in via something like PCIE. Doesn't need to be on the CPU or GPU imo, though a basic one on at least one of those may make some sense for lightweight tasks, similar to graphics modules on CPUs today I think.
@@stachowiso a 7700k is good enough, but people still buy and use 13900ks because it's faster
I already saw laptops with NPUs but 8gb of soldered ram 😂
Meanwhile a used old Thinkpad can be upgraded with 32 gigs and accomplish 90% of an every day users needs WITHOUT an NPU.
Looked at an Apple machine?
@@bhume7535 Correction: Marketed as 32gb max but can actually be stuffed with 64gb with high-quality so-dimms and still works! ;)
@@othername2428 well mine only has one sodimm. So 32 for me.
solder RAM is way faster !
WIN WIN WIN
NPU = not probably used.
Should be called UFB = Used for bloatware
Not Phor U
My first thought was "can I save $10 on the CPU by getting one with the NPU disabled?"
Voice dictation is pretty useful.
@@demolition3612 typing is more useful ngl
- Computer, what big NPU you have!
- I need it to better spy on you.
That's not what a NPU is for. Just because Microsoft requires one for CoPilot Plus's REcall Feature does not mean the NPU is spying on you, the software is.
@@thelaughingmanofficial Strange logic. You could just as well say "Your Nvidia GPU is not made for showing graphics, the drivers for it are". Duh!
@@sem_skywalkerblaming hardware.
For faulty software usage is clown behaviour.
🤡 : The camera was made to spy on you.
@@sem_skywalkeryour Nvidia GPU is made to do parallel processing, which just happens to be very efficient for graphics which is why the drivers render graphics through it
So yeah, the drivers tell the processor to make graphics
@@sem_skywalkeralso an npu is just very efficient at some operations that most ai models use a lot during inference
It's doing everything except for spying on you, especially since the npu being in your laptop means the processing can happen locally without sending your screen to Microsoft
Looking forward to the AI bubble bursting. There might still be some AI that's worthwhile but it's way overhyped and underdeveloped right now
Going to have to agree. I don’t think Nvidia is just going to be the biggest company in the world indefinitely.
Specifically the LLM bubble. AI like AlphaFold is way underhyped.
It's not AI. It's algorithmic transformers. They are not intelligent.
@@ahmedal-hijazi3618If it falls hard enough, the GPU market could open up massively for new competitors. Really unfortunate that for professional work its Nvidia or nothing.
fuck yes I'm so tired of hearing about AI every time I open any tech media! This has happened with the Big Data fad, VR, and other tech trends. But so far this is the most obnoxious one because it is literally everywehre even your grandma is talking about AI for some reason! They are playing us like a damn fiddle
Just when they faced limits with CPU and GPU suddenly out of nowhere appears new NPU thing that you definitely need in every device.
Cap
you only need it if you plan to use AI stuff, the problem is that probally many apps is going to use AI stuff even if you don't want.
@@gabrielandy9272 you don't need anything other than CPU and GPU
@@gabrielandy9272 Are going to? Are going to? Have been doing for about a decade (what did you think RTX was?). NPUs are basically tesla cores that run on an apu instead of a massive gpu. Microsoft and apple pushed for integrated npus so laptop and mobile users could have access to the tech that made Nvidia trillions.
"No questions. Just consoom products and get excited for the next products."
guess that is why Apple was saying 8G is enough in the recent past so they would have a solid up-sale later
Oh that was so stupid. And still is. Seriously really expensive laptops and 8Gbyte standard for only $200 you get 8Gbyte more????
Apple is insane
Unfortunately this is exactly what happens when there is no adequate competition. Let's remember Intel's monopoly when AMD were one step from bankruptcy. Every single new generation of i-series processes had +5% improved performance at best. Yet still, Intel thought that it is necessary to have a new chipset for almost every single new CPU generation. And in the the end they even started using regular thermal paste INSIDE of their CPUs.
If AMD were able to return from dead and become a new leader in CPU segment, Microsoft has all chances to do the same, if they'll change their management which only wants to spy more on their customers and include as many ads as possible into already non-free OS. Corporations, gaming and familiarity - three main reasons why people still use Windows.
@@MasticinaAkicta no. Apple isn’t insane, customers who buy them are.
Apple is very clever. They know how to milk their sheep.
@@MasticinaAkicta you spelled 'insanely greedy' wrong.
To be fair, as someone who hates apple's BS, Apple's software is very efficient, so it can be argued that the base non-pro models can get away much more effectively than on windows (esp after manufacturer crapware is factored in), even though I don't know what swapfile/pagefile on MacOS is.
I've transitioned from 'wow, that's kind of cool. Wonder what they'll come up with next?" to "Fuck off with all the AI, please," in about six months.
Wow, that's kind of cool. I wonder how many ways they'll find to abuse it.
"ai" (really just glorified autocomplete) got so much worse the last 8 months its actually hillarious. but its still apparently needed in every device they could possibly stuff it in because... money
i can't believe you're not ok with having something screenshooting everything you do on your PC, and let's face reality, upload it or some summary version of it, to be sold to some company in a "do you agree with the new terms of service" without a opt out option. You're being unreasonable man /s
@@Jeffcrocodile yeeah.. when i read about that windows update with this new feature, that was the day i switched to linux. i still have windows on a seperate harddrive but.. i find myself using it less and less. im sick and tired of this (to quote luis rossman here:) "rapist mentality" of corporations becoming more and more apparent each day... i want my dumb things back. 99% of things dont need to connect to the internet imho.
@@h0125t There's your problem.. using Win 11.
As long as Linux doesn't, no
I get where you are coming from, but that's a brain-dead take. Computer hardware is driven by software which can utilize it, the os platform is generally irrelevant (excluding 1st-party software which are tied to the os). Just as an example, lots of games use DLSS upscaling which uses Nvidia Tensor cores (basically an NPU), got nothing to do with Mac OS, Linux or even Windows for that matter.
Welcome to the 1990s. Enjoy your stay.
Linux user try not to constantly tell everyone that they use Linux challenge: Difficulty: Impossible
@@beckergreyso apparently, you don't need it until a lot of software uses it. And until something in Linux actually needs it (not gonna happen in next 5-10 years), tech can be ignored.
U actually can't name 5 programs that actually require this thingie, and can't be stupidly emulated by OpenCL or Cuda
@@isadora-6th Did you even watch the video? The name of the game is 'efficiency in low power hardware'.
This is the best explanation of NPU so far! Other people that have talked about it are either only talking from political/emotional standpoint or from the money aspect. No one has ever discussed it in a practical application angle like you did. Great work, man!
That’s RUclips commenters nowadays!
"No one has ever" ...... lol :D
Yeah I'm just glad I learned something lol
To give some clarity and correct some misunderstandings:
NPUs are way cheaper than GPUs.
NPUs use a lot less power and are more efficient than GPUs.
The total of parameters are cross all layers.
To do a neural network (NN) calculation you’ll only need the previous output and the current layer. This is why NPUs don’t need excessive RAM.
Specifics:
Face recognition on an image only needs a model that is less than 500MB in size and a NPU of about 1TOP.
Small Language Models (SLMs) need about 2-6TOPs to have a reasonable response time.
Real-time object detection needs about 10TOPs.
This is all running under 2W.
A 13TOP NPU is about $70 USD.
All depends on precision. What quality and to what precision is the facial recognition? The rest of your comment, you could talk in terms of units/stats/specs, instead of arbitrary terms such as large/small language models. What's large?
Im frequently seeing TOP mentioning data type(?) (Like int8, fp16, etc..) what does it affect?
@@gilangwahyu4450 Precision of your models. The more decimal points allowed, the more specific your parameters can be, allowing for higher precision on your results.
int8, only allows whole 8-bit numbers. That is -128 to 127, since the first bit is reserved to state whether it's positive or negative. fp16 allows 16-bit, floating point numbers. So you can have decimals, hence more precision.
@@Potato_Quality7 Idk about in practice, but in my masters, theoretical NPUs don't have precision the same way GPU chips have. It doesn't use binary or transistors. It is extremely basic multiplication/addition gates, not on/off logic gates. It isn't digital.
Because it is the most simple signal additions, it uses the minimum amount of power and has extremely high performance. The trick is to convert the digital signals to analogue for the chip and back to digital.
@@HeyJD123 Yeah, that makes sense. The point there is to spend as little energy as possible while maintaining feasible results. Moreso for a hobbyist. I still am not too informed about the whole thing, but for now, it seems to be more of a gimmick, since what it can do is very specific.
Yea recall is a huge security and privacy nightmare
I agree but man would it be nice to roll back when a coworker messed up a Excel formula 😂.
Only real positive in my world lol
@@Theausomecaleb use git or subversion some kind of version control on that file and then any version you want to revert to can be done. Though there is a bit more technical know how that goes into that and getting all your coworkers to learn that is a uphill battle. Plus source code version control is really good at text files and if you are working with images or binary file types you will probably going to need an extensions for your chosen version control solution that works with the file formats. Otherwise your version control will get slow and bloated keeping every binary version in it's historical archive.
By never sending your screen anywhere over the internet and instead doing all processing locally? Sure
@@PFnoveit will require modyfing registry to not send knowing Microsoft and how they love to make spyware
@@Iswimandrun It would be nice if github had a better GUI, it is one of the hardest to use and most annoying pieces of software I use, and there is no reason it needs to be such a pain to use.
I'm impressed that you explained a simple neural network correctly. Not a lot of people actually understand it. There's of course a lot more to it like back propagation and quantisation.
Frankly I don’t want it if it means I can avoid AI features. Unfortunately it looks like the options are “run the AI locally” or “give us your data and run it in the cloud”. I don’t want either.
Absolutely hate how AI is being forced into everything now.
I don't know if it's been forced as much as it's supply and demand .. also imagine making an non NPU device and an NPU device of same product - why would anyone keep inventory of non NPU products which almost no one will buy
It's not forced. You are given the chance to use it. It's not like they make you log in to your devices via a chat bot and send email and IMs and everything else. They're making it easy for people who do want it to use it.
now? OCR is AI, google pixels cameras have always been terrible but theyve taken the best photos because of their AI systems, weve got image identification, facial recognition, fingerprint recognition, machine learning, etc. its been coming for a very long time. generative AI is simply the most recent step.
If I can run NPU with Ollima then I am happy camper
It's been there all along, the only difference is that the it got enough public attention that the focus of marketing shifted towards those features. You probably use a ton of ANN based features without knowing, and have been for years...
I'm sure Apple's 8GB laptops will be totally 'equivalent' to 16GB for AI use...
Apple fan boy 🍼
According to themselves, yes. When it comes to actually fitting the whole model in memory, no.
@@abhijith6919 Hardly. I was being sarcastic.
@@LeonardTavast That was my point. Apple's stinginess with RAM was all fun and games until they wanted to catch up with AI, and now looks very short-sighted.
@@axi0matic catch up? i seem to remember the neural processor being added to iphones way before there was a real use case for them. oh wait, the measure app! cant think of a more useful use of AI than the amazing measure app.
the realization that 2017 is seven years ago 😳
what ??? really ..... o no time fly too fast ....
when you realize 2015 was five months ago: MIND BLOWN
why did you have to go do the math 😭
💀
we living in simulation ever since 2020 someone please wake up 😭
Curious note: in the early days of NPU in phones, they were less powerful and power efficient than GPUs. Huawei, Samsung, and Qualcomm relied a lot on their GPU to do AI work. Google with Visual Core started to deviate and then with Tensor, few months after Tensor Snapdragon 8 Gen 1 was the first Qualcomm chip with a really powerful NPU.
They were used mostly for that "ok, google" voice recognition back in the day, not this "LLM" bullcrap, it was supposed to be less powerful, but it was actually more power efficient because it wasn't to be used for generative AI which runs it for a longer time.
Dude i must say this is the best video that i have come across that explains the need and differences of various components (such as CPU and GPU) in simple terms.
We need a video explaining the clean shave, brutal betrayal 😂
Certainly, quite shocked when I see him 😅
dude looks great clean shaven
No wonder I feels something is off..
Lost password, had to create egg account and it takes 2000 posts to change profile pic.
Except this is in digital twin.
clean shaven martin jumpscare (very scary)
One way I could be more interested is if the NPU had a little more data types, and would be good for other DSP-style calculations, like impulse responses for audio.
From reading the docs, you *could* use it for convolution and maybe some more operations, but I don't know how better they are than a GPU
@@shadamethyst1258 An NPU would be way faster than a GPU for convolution while drawing a lot less power. Still, audio is CPU bottlenecked because the processing is serial, not parallel. GPUs are nice for 3D rendering graphics and vector calculations, not designed for AI/machine learning.
At least an NPU would be extremely helpful for stem separation, my i7-12700K alone takes like 20 mins to process stuff, i watched the latest Apple M chip doing the same thing and it takes seconds to finish the processing because of the NPU. CPU wise, my 12700K is equivalent to an M2 Pro and slightly more powerful.
Qualcomm hexagon npu was previously a DSP with matrix extensions
AMD's AI engines are direct descendants of the DSP engines in Xilinx FPGAs!
The resson you want a dedicated NPU instead of just a GPU.
Is because it leaves the GPU free to do its own well optimized Specialized functions.
It's the same logic that got us to move on from just having a single Processing Unite
That, and that CPUs were and still are enormously bad at extremely parallelized tasks.
10 years from now we get zpus or some other next processing unit.
Thank you sooo much for explaining this more in detail instead of just saying "neuronal network" as if everybody knows how they work.
Wow , you explained it in such an easy way. Amazing video
Glad you liked it
Whats concerns me is that at some point in the future you will have massive for profit corporations having easy access to billions of NPU's.
I fear that our devices will become a grid computer for manufacturers against the will of the customer.
Sure, it would be efficient and beneficial, but FOR WHOM?
A capitalist would just love to abuse our private processors and electricity to benefit their own businesses: what an insideous way is it to disguise corporate carbon emissions by pushing them silently onto the users of their proprietary hardware and software.
We do not want AI. We want Performance and Battery Life. PERIOD
Exactly, that is why NPUs are a thing, they increase both performance and battery life when running AI tasks.
@@Kikikan no one was saying anything about wanting running AI tasks ;)
Amen. The tech companies are delusional. Either that or they’ve got an ulterior motive for pushing this crap on us.
@@IvoPavlik Then I also hope you do not take pictures with your phone, or use auto-generated subtitles at RUclips, because those are also AI tasks ;)
@@Kikikan then put the npu on the camera module itself. YT subtitles work don't take much compute, and could be calculated when the video is uploaded to YT itself instead of recalculating over and over again. NPUs just don't help average people much right now, they are too narrow in scope with too few uses. GPUs got made in response to demand, NPUs were made to try to create demand.
As a computer scientist that works with this stuff, yes it is very much needed for AI. The GPU is like a more generalized version of this, with the exception of nvidia Gpus that are built with this extra task. It works, and yes a powerful GPU can do the same… but you could scale a NPU to the scale of a GPU and get the same if not more performance with 10x less power. It would also free up the GPU development to only focus on graphics, and offload neural stuff to the NPU, similar to when GPUs became popular. Meaning GPUs would also get a significant speed up for graphical rendering in the future when paired with a NPU
can't wait till computers have 30 different processing units which get installed like ram cards and can be switched around by plugging them in.
NPU
. Normal processing unit
IGPUS - Integrated Graphic that Sucks
GPU - Grossly Priced Unit
I was wondering what it stood for. For some silly reason my brain said "nothing processing unit" 😊 thanks 👍
@@vevisa3287 Goat Packaged Unit
Neural processing unit (not normal)
I would be interested if the NPU can run larger open LLMs utilising RAM which is cheap compared to trying to increase VRAM. Unless you're willing to give Nvida the price of a car and your kidney and maybe your first born child then 24gb and maybe 28 if they bump the 5090 a little. Ultimately I'd love to see expandable VRAM like we had back in the 90's.
The problem with this is that, with time things became more and more reliant on memory placement rather than memory space. That is, nowadays, it's not just whether or not you have the memory somewhere you can find it to process it, it's also whether or not it is in shared memory, local memory, managed memory, mapped memory, etc. Those things all change performance by several orders of magnitude, and so while you could technically run a model using normal RAM, it would still be extremely slow due to bandwidth issues because you'd have to be copying memory all over the place.
@@himalayo As a layman i dont really understand your comment, how does using RAM create bandwidth issues? I always thought RAM has the fastest/broadest interface.
@@flowgangsemaudamartoz7062
put simply: DDR5 ram gets you like 50gb/sec of bandwidth, which sounds like a lot until you realize a 30$ ebay gpu designed ***13 years ago*** can do 128gb/sec, and that is still considered uselessly slow.
-=-=-=-
put less simply: LLMs need to rerun the whole program for each token (which is 4 characters), and when a flagship LLM is about 300gb of code (after being compressed to 6bit from 16bit), even if you could fit it into regular DDR5 ram, this whole message of mine is equal to at best 25 fucking minutes of processing time.
thats not to say that dramatically smaller models dont exist, you can absolutely find a decent LLM thats under 20gb, but processing on a cpu or npu will definitely be a third class citizen type of thing.
we need not just massively but exponentially more bandwidth, in order to make running LLMs on a cpu's DDR memory instead of a gpu's GDDR memory an actual thing.
hope this cleared things up for you.
personally i very much look forward for when LLMs get good enough that we can add them to online dungeons and dragons... though i think the world will find more value in the hardware we invent to run LLMs rather then the LLMs themself.
@@flowgangsemaudamartoz7062I suppose that RAM is just for the space not the actual processing so the RAM and GPU (NPU) would have to constantly be talking to each other and that’s where the bandwidth comes in. I think that’s also why the X3D-chips perform so good in games because you have the cache directly on the chip and are not restricted by RAM (which is relatively slow when compared to on-die-memory)
Working with Embedded Devices in building automation equipped with (micro) NPUs. Targeting Video and Audio based person localization with a low heat and power footprint at low costs :)
The video breaks down perfectly the most important parts of NPU computation with well created graphics. Thumbs up 👍🏽
As a programmer, i can think of a handful of non-AI/DL/ML workflows that could see significantly improved performance esp. in 3D processing and post-processing
Simply put, anything that processes matrixes in both parallel and branching patterns.
Some of this can be simplified to vector SIMD operations, but that's not a 1:1 replacement for parallel matrix operations.
So-called Tensor cores are physically designed specifically for matrix operations.
Isn't that basically what GPU's are for?
Man, I studied Bionics and Neural Networks back in 1988, at the TU Berlin (Prof. Ingo Rechenberg). That had been much too early. I never have been able to monetize that fundamental knowledge for me. My Diploma Thesis was dealing with engines of actual electrical cars back in 1993, running for about 300km with a high temperature NaS battery. I can say, those Neural Networks do work exactly in that specific way.
Yes, NPU are useful, and it doesn't have to be for "LLMs" or image generators only. It's for stuff like upscaling, OCR, voice-to-text, image object recognization, etc.
So useless for 95% of people.
Thank you for giving this beardless man the chance to upload a guest video on your channel - looking forward to a video by the real Martin next time 👍
12:53 the use of the word "was" in this sentence made me giggle and rub my hands together all evil like
I think this is a good thing tbh. Regardless of whether AI is a bubble or not, having hardware that can massively parallelise workloads like the ones demanded by AI is never a downside. It's the same way how GPUs pivoted from graphics to massively parallelised computations. Increasing the chips' raw compute power can only take us so far, at some point we have to bite the bullet and come up with smarter designed chips
battery>storage>performance.
Thats what I want, in order. Performance is more than good empugh for amything im going to use (until new games assume I have newer hardware and refuse to work on somethong a year old) so im completely happy to stagnate it to improve battery, and bter storage, denser and faster access would be great
They're designed to monitor everything you say, read, and type. There's no way this ends up going badly.. It will likely only be niche manufacturers who will offer laptops without a neural engine for privacy reasons. After Microsoft's debacle with Copilot, there's no reason to think they won't be pumping this in with the usage data sent back to them.
facts, from their perspective the ROI on data has no real cap so any extra bit they can get (however ethical) can be monetized in some way with near infinite return potential
in lack of NPU, they could just use the GPU instead. Not having one is very most likely not going to be an effective protection.
One in a million people care about privacy, if that. Think how many people use Android and iPhone, almost all of them. I only know one person who cares enough about privacy not to use them. And he still uses Windows... Recall will happen, it has such huge advantages for consumers that the privacy loss won't even factor in to their decision. Soon it won't be on-PC, we will accept that everything gets sent to the cloud because that will make it even more powerful. Eventually we will even get fully homomorphic encryption for NPUs too.
@@jgnhrw One in a million?! A much bigger percentage than that is actually willing to take steps to protect their privacy, such as 17% of people using tracking-free search engines, 31% of people having used VPNs, 15% using encrypted email services, etc.
To me this suggests that at least about 15% of poeple care enough about their privacy to be willing to compromise some amount of convenience to protect it, about 1 in 7.
Additionally, more than half of people voice concerns about how their data is used instead of simply not giving a shit, but most do not take any steps to prevent this.
@@jgnhrw most people care about data privacy, they just dont understand how it works. Everyone fucking hates personalized ads, the popular consensus IS that its a violation of your privacy. People inherently do not trust these companies. They don't want to be their profit.
The NPU for most people is to help companies spy on you using your own hardware to assist.
CPU: Work
GPU: Entertainment
NPU: Eslavement
This is, hands-down, the best video on this topic on youtube. Great job! Hope you get ta TON of shares!
Every time i see a die-shot where there is an NPU i cry for the cores/gpu/cache that could have taken that space.
Yesssss 1000x yes, same thing I think, specially since gpu can run ai if that was the case... Such a waste of silicon and everyone's time
Exactly. They could be giving us CPUs that are 2 - 3x as powerful, instead they jam in extra spyware that you can't opt out of
Spoken like a person who doesn't know Jack about how pcs work@@chrismurphy2769
Cores are literally the same size lad. What are you talking about
@@VictorBash-h3s NPU takes up space that can otherwise be "the type of chip we actually want and pay for in a cpu", such as more cores/cache.
And if you watch the video, some cpus have even larger npus.
My computers always have powerful GPU so I have no need for NPU and I don't want one, and I don't want a cpu designed with drawbacks necessary to accommodate one.
What's more is I don't trust what companies will program NPUs to be used for.
So, chip makers rushed to make a worse version of a GPU, which most people already have, just to say it's an "AI laptop/PC" now?
This is absolutely happening because computer sales slowed down after the pandemic. There is zero consumer demand for anything going on here.
Bro, this is better than Nvidia having unquestioned Monopoly on ai. At least with these NPUs you are able to use ddr system RAM which is much cheaper than VRAM specially if it's from Nvidia who intentionally increases the cost of higher VRAM models in unproportionate to VRAM prices and also cutting the VRAM amounts in gaming GPUs so it won't self-compete.
But ai is definitely underdeveloped for mainstream consumption and overpushed by tech companies when there isn't proper value to NPUs yet.
Would be useful if NPUs can act as DSP, 2D GPU maybe through image generation then mainstream consumers start getting use from those npus
@@jdargui1537 Integrated graphics could already do that, except integrated graphics is far more useful than an NPU.
@@jdargui1537 This would just be a waste of silicon. Light graphical loads like that dont need to be moved to a different chip, your GPU can do it with like 1% of its processors...
I've been one of those people that always upgrades to the new device. A.I. is the push I needed to break that cycle. I don't want A.I. on everything I own. I don't want devices that have an intelligent spy built in.
Thank you, A.I., you are set to save me a lot of money in the near future.
we don't need NPU. the GPU is not used during most operation, while not as effective as purpose build NPU, it is powerful enough to deal with most of the workload. so what's the point? only mobile devices really want them because of energy reason which most people don't care about on their workstation.
You dont*
@@VictorBash-h3s So you are going around parroting a contrarian take here? What happened with siri, amazon alexa and google assistant? its been practically a decade and nobody cares about them even though they keep getting forced onto users. AI is literally the same thing and NOBODY wants it. Look around and stop fooling yourself.
That honestly was the best and most intuitive breakdown of how neural networks actually work. Amazing job!
Still one of the best tech channels on RUclips. This video is extremely well done
Just wanted to chime in on the amazing clarity of your explanations in this video. I wish even 1% of the videos I watch were as brilliantly explained.
Good job but I'd add two more very important factors to the AI chip requirements: 1) it needs to use low power and 2) it needs to be low-cost.
This is a really fantastic and educational video. The way you presented a basic neural network was genius. Explaining complex subjects in simple terms is a beautiful thing. Kudos to you. 🎉😊
ngl NPU will increase efficiencies but there will be a workaround for disabling the telemetry/data collection without affecting the program or make them less intrusive. But the NPU may remain idle if there's nothing using it
The simple answer to "disable data collection" would be just to not use windows.
Depending on what you need the computer for, Linux would be the answer .
Bro expalined all my doubts in a just one video. Thanks this amazing i actually understand few things i didn't know.
You are amazing! I work in deep learning and really liked the way you explained the math.
Just few generations ago, Snapdragon was their ISP to do NPU calculations.
I think you missed the main point of NPUs by saying they are a "minor part" of the die. Thats the appeal! The fact that you can have substantial ai workloads using only a few watts via a tiny component on the cpu is huge.
No need for the extra price and power of a dedicated graphics card which pulls 100x more power and also for dedicated gpus require y extra hardware attached to the computer
I think this equation will change as they get larger, though. I wouldn't want to shell out $1000 for a phone or $2000 for a laptop with the puny NPU's they currently have if I could avoid it.
Until you remember that memory capacity and bandwith are a thing, meaning that the NPU
is only useful for small things like voice assitants and useless for anything interesting.
Until you realize that the real world difference between 2W and 20W is negligible on a desktop
8:13 TLDR - Models that can run on your phone are small enough to be trained on a capable home computer, not a requirement that it's trained in a server center
So this video is a great introduction, but 8:13 - to give a high-level gist of it, training could be done on any device, but several multiples of the size of your model needs to fit into the device's RAM, and training speed is limited by that and your processing power. The same is for inference / run time, but you only need basically a few multiples of the size of your model in RAM. So large models that needed to be trained in server centers are being run in server centers (you send your input over the internet, then it's computed on their end, then sent back to you), or medium sized ones may have been trained in a server center can run on your computer. Models that can run on your phone are small enough to be trained on a capable home computer
EDIT: holy shit I continued watching and this is more that a great intro, this is a great overview of the current space
Npu can only run small ai models in a PC. There isn't enough ram to run llm
It depends, if the NPU uses system RAM on a X86 system it would be practical. The issue would be memory bandwith, so we will get PCIe NPU cards in the future with dedicated memory (NRAM?).
Ironically on the mobile devices there's more RAM because the NPU/GPU/CPU has a unified memory model. So if you have a mobile with 16GB of RAM, it technically has more RAM for the NPU than most of the mid-level GPUs on the PC, Nvidia is profiting a lot from just selling memory chips overpriced, because you can't just put more RAM in the GPU like you can on system RAM.
But when it comes to the HPC GPUs for Datacenters, then they have removable memory.
And before anyone says GDDR is faster, it is not, GDDR4 has the same performance than DDR5, the only difference is that the CPU access the memory at 512 bits (yes, a 64bit CPU access memory at 512 bits, what do you think "DDR" means, nowadays it transfer 4 bits per clock, and with dual channel, that gives you 512 bits ) and the GPU access it as 4096 bits. DDR is a bit faster in random access, and GDDR is a bit faster on sequential bursts, but you can get system memory that has 12-12-12-12 and it'll be faster than GDDR, GDDR is mostly marketing.
@@saricubra2867please no more things on the shared RAM bus degrading the performance of everything that uses it
@@monad_tcpNot faster than GDDR6X
This might be one of the best videos you've ever created. Amazingly done!
Its great to see that we are seeing developments in machine learning realtime beard removal models!
your brain is now online?
Exceptional video! I’ve never seen neural nets explained so brilliantly!
whe-whe-where is your beard Tech Altar sensei!!??
00:34 and glorious 3D televisions!
Being able to run AI locally can become a really, really big deal on the long term.
With a local processing unit, we may be able to choose which AI model we run; We're not dependent on the whims of big IT companies; We can observe how google search isn't as good now as it was in the past.
When cloud served AI start to loose it's neutrality because of financial incentives we'll see how much damage it can cause. It may sound silly now, I believe strongly that in the long term, personal NPUs is actually a very positive and important thing for democracy.
Cool! I don't know why I haven't seen your channel before. I appreciate the good animations and graphical explanations and the work that goes into a video like this. So many 'Tubers are using meme after stock clip after another meme... Also, thanks for demystifying the NPU. I was just thinking it may be just a word. Comparing it to the other dedicated accelerators is super meaningful. ...Subscribed ❤
that's a hell of a neat informative explanation. nicely done.
I can't even begin to explain how brilliant this video is!!
10:24 I need that too
No beard you feel so different xD is this a different set?
AI removed the beard.
This channel informs, entertains, and educates... WELL DONE!
NPUs are just specialized units made for s single task: computing multiplication of matrix with very large sizes, by using many parallel multiplications and a final layer of cumulative additions; the whole being surrounded on input and output by "samplers", i.e. functions that adapt/amplify the signals when they are not lienar by nature (this layer can also use matrix multiplication, or specificat operations like "capping" or sigmoid nonlinear conversion for smooth transitions); Then you need being able to schedule all this task efficiently (when the matrix is too large to fit in NPU registers and the number of multipliers and adders, you need to split the matrix to do that *sequentially*).
However there are ways to significantly reduce the amount of calculation needed, notably when matrix are sparses or contain many partial replications: All the optimizations are in finding the best way to split these matrixes into submatrix and eliminate those submatrixes that won't contribute to the final additive result: for that you have not only "trained" the matrix, but also designed some threshold that will allow making the maxtric more sparse. As well the hardware acceleration can help automate the discovery of replicated submatrixes, so that you need to compute one of them and not all, and cache the result or reuse it in the scheduling of the pipeline.
Once you've understood that: the NPU is not just for "AI", it has many possible uses in various simulations, and for processing massive amount of datas. When you create a AI model (by learning process) what you create is a set of constant parameters that feed the matrixes. Then you can put mulitple layers of matrices used exactly the same way to create other effects (including retroaction for AI using passive learning and adaptation).
The difference with GPUs is that NPUs are more general and can use different trained data models, whereas GPU do this for specialized tasks (with hardwired models and specific optimizations, allowing them to do computation on even massively more input parameters, but with little of no retroaction: the matrices they use are small in GPUs to compute their "shaders", computed as small programs running in parallel with the same kind of data, but this is the massive rate of inputs and outputs needed to produce high resolution image at high farme rates that changes radically things compared to NPU). A GPU may integrate a NPU unit however for solving some local problems, notably for denoising the results of raytracing using a trained perceptual model or for some effects like variable radiance or transparency, and variable diffusion depending on local conditions or on the actual gameplay or during transitions and important transform of the viewed scene).
So do we need a NPU? Yes, because it is generalist (but not just for AI or graphics!). They will more efficiently do something that the hardwired models used in GPU or their small number of pgrammable units will not be able to do efficiently (because these programmable units support too many instructions that are difficult to synchronize, so they are limtied by the rate of their clock as their operations are still too much sequential, even if the instruction set is much simpelr than the one used by the CPU).
The GPU and NPU however does not replace the CPU which is still there and needed for controling and coordinating all works to the effective demand and to user interactions and many unpredicatable events (that can't be preprogrammed or correctly predicted and for which there's no "early sensors").
The AI concept is not the problem we fear all. The problem is how the input data is collected to train the AI model, or as input to perform the AI computation to get responses, and then where the generated response goes (who will use it, for what, and for how much time). And what will be the impact of decisions made from this uncontroled output (most often invisible to the users that cannot decipher it without designing, using and controling their *own* AI system trained with their own goals): data collection by "big data" third parties (too often very hostile and using very anticompetitive behaviors or using all tricks to escape the legal consequences of their abusive action) is the major problem.
In itself a NPU is harmless, but the wya they are being designed and implemented is that users have no control (NPU techniologies are being integrated with networking technologies: you cannot use them with connecting these *integrated NPUs* to these spies that will decide which trained data model the NPU will be allowed to use, and there are hidden models preimplemented in NPUs that are designed to spy you more efficiently).
I still use my first computer...purchased as an adult in 1982. It's no longer a daily driver (close), but it demonstrates my pre-connected tech headspace (300-1200 baud modems are just not comparable to 'connected' in the 21st Century)
I don't think my generation is ever going to be acceptant of the surveillance functionality that has been an early use of this tech. (Microsoft's 'Recall' saw me move to Linux as my primary OS as I anticipated the direction Microsoft was heading and wanted no part. Also refuse to land-fill my older, but still absolutely functional hardware, but that's another tale - I've held onto 40 year old hardware after all.) It has been my observation that the under 40 crowd have fully accepted trading privacy for convenience, and this tech is truly for them. It just gives me the heebie-jeebies.
From my OS watching my activities, AI narration in RUclips content, to my questioning my own eyes at every turn, I was dragged kicking and screaming into the 21st Century. I was raised in the Golden Era of Science Fiction. Feels like neural networks are a recipe for dystopia.
Thank you for placing your sponsorship at the end of your video, I make it a point to always play videos to the end when this is done (I tend to skip 'in video' sponsorships). I don't speak for everyone, but I am far more willing to 'pay' when I've watched a video that I've enjoyed, rather than being interrupted. This is especially true when the content creator creates the ad. It has a more natural flow after the fact. Anyway, it's appreciated enough to be noted here.
NPU's are in the "Field of Dreams" stage, which is "if you build it they will come" state that you introduce the capacity first then you make or deploy the capability later. Every device will eventually become a neuron in the large AI botnet.
Did you consider that perhaps the NPU size (currently) has a pretty much fixed absolute size so that is why the proportion of NPU size / total processor size looks bigger on smaller chips. If that is the case this means that there is not really a priority for NPUs on smaller chips but rather that is just the required size for it to be functional.
To my knowledge there is no reason why the size would be fixed. Also, NPUs being in mobile devices since 2017 while they are only now making it to laptops and still being nowhere near PCs shows that they are very much tied to size
@@TechAltar Couldn't it be one of those cases where one doesn't want to design other versions because of the R&D cost that comes with it? Similar to how CPU cores vary in quantity, but (usually) not in design?
For at least 2 decades, NPU stood for Network Processor Unit, it was used to translate all the URL handles into the actual adress, it was a bit sloppy that Network gave way to Neural since they still exist. Tidbit, Network PUs also use an inteeresting type of DRAM called RLDRAM for Reduced Latency, these DRAMs have about 20* the throughput of regular DRAMs for hash computations. Its a real pity this type of DRAM never found a place in general computing, it could be the basis of outer most cache off chip and soldered RL DRAM with regular slower DRAM DIMMS.
3:25 It's wild CPUS used to be ART made of wood and marble and gold and now it's just glass and plastic how tech has fallen.
Just a hunch, but I think they'll become essential for resource management and security. There's a lot of use for something that can have a "feeling" about misbehaving software, hardware, and security risks.
I have to say, apple has a good approach here with their unified memory.
On a desktop pc you need high amounts of vram..
But nvidia only lets you buy 24gb on "normal" gpus.
Meanwhile apple can put as much ram next to their soc's as they want (for 200$ per 4gb lmao), allowing them to load massive models.
I always come here for the extremely catchy background music.. especially for the Friday show, but this time it had something about NPUs.. neat.
I certainly don't need one right now and I can't see myself needing one any time soon.
I'm also not getting anything with a new CPU any time soon because my devices still do everything I need them to do
Best explanation - EVER i have seen..please explain bias and activation functions !!
Hopefully this means that GPUs wont cost an arm and a leg because everyone will be buying NPUs instead for AI/cryptomining
Don't think of it as failure. Think of it as time-released success.
People here making fun of your shaved beard but I say you look very young. More like you hit puberty. Very cute.
Inference with low latency is the future model. Paper like LLMRoute explain how prompts can distribute the complexity between the SLM (local) and the foundation LLM (cloud) so you can have a conversation with your device.
I agree with Lisa.
More Tops. 🌈
Free the nipple!
I think it will be used for other neuron nets not just LLMs like image, audio, video editing or encryption, data compression, computer vision, face recognition, fingerprint readings, translations and I am sure some games would find a use for it.
Very good video, common TechAltar W.
Glad you liked it!
Computer engineer here: I personally think the idea of an NPU is an excellent idea but probably not for the reasons you’re thinking initially.
Of course AI is the hot new thing right now and of course it will more than likely get more integrated and be far less invasive. However, if they maintain this “NPU” as a mainstay in computer architecture, it could mean that programmers would have a whole new frontier outside of the GPU and CPU to run these Models which have already been taking up so much space on the system already!
Edit: I just got to the part where he said the same thing 😅 I guess I should watch the entire video before commenting
I want an efficient laptop. I dont want windows to push their ai crap into my throat
Did you even watch the video? NPUs are efficient af.
So its basically a third GPU allocated for inference only. It actually makes sense, A100 is twice as fast and takes half as much power than an RTX 3090 (both released around same year) because its optimized for AI stuff.
0:40 nope - physically, like AppleNeural engine, they are just co-processor on steroids. Application-Specific-Integrated-Circut. Problem is business model - software and applications.
As a long time watcher of this channel this is one of the best explainers I’ve seen on this topic. Well done🎉
*N O .*
*We need CPUs that dont eat the battery like Chrome eats ram and don't crash like Windows Me.* * _coughs_ * Intel * _coughs_ *
Intel - degradation inside!^^
i9-13900HX has twice the battery life of a Ryzen 9 7000 series mobile while watching youtube videos (Jarrod's Tech). Some of those chips can look very competitive vs Apple's stuff, AMD also has nice chips.
Gaming PCs really need good NPUs. They are finally now releasing games with language model powered npcs who know everything that character should know and can talk about these topics, but they have 0 situational awareness. To make these npcs aware of where they are and what is happening around them would require more local processing to add this data to the language model.
As soon as I heard about Recall, I ran out and bought a new computer that DOES NOT have an NPU. It may be the last computer I ever buy. You are NOT taking screenshots of MY computer. Ever.
you could just use a windows-skinned linux OS. Lets you have up to date hardware AND no microsoft fuckery
Thisnis the first time I really understood NPU, thanks 👍
Gpus with things like Tensor cores are much better npus anyways. Npus are only more powerefficient. (Oh he said exactly that at the end of the video, I wrote it before I finished the video.)
"Gpus with things like Tensor cores are much better npus anyways"
Because of the dedicated memory and the speeds. If we have dedicated NPU cards, GPUs would return to their roots (rendering 3D graphics as an example).
@@saricubra2867 Pretty much they already exist, just not for us normal people. Just look at how much shit tons of money Nvidia makes by selling their server AI accelerator cards.
@@geiers6013 Exactly, that stuff isn't available to consumers, definetly not fair.
Maybe it will take time until they shrink those sillicon dies for a practical use.
There are available versions, not from Nvidia but other companies. These NPUs just don't have the power of industrial NPUs. On the other hand, we couldn't afford industrial NPUs even if Nvidia would sell them to us.
A Coral USB Accelerator using a Google Edge TPU coprocessor is much more affordable. Costs just 60$.
@@geiers6013That's not true. Any Nvidia GeForce card has a processor for neural networks. Even my laptop has this included.
But you can also buy dedicated processors designed for AI, like the Coral USB Accelerator which uses a Google Edge TPU coprocessor and costs just 60$. They just are less powerful. An industrial NPU is just out of range for private people because of the extreme costs of such devices .
I think one common misconception is that AI works similar to the human brain. In that we directly link thoughrs to external stimuli. Where the truth is that AI is just brute force in that it has to run through millions if not billions of calculations to arrive at an answer.
Tesla cars have had a NPU since 2017, which has been finally utilized in 2023 for FSD version 12. The software is only starting to use the NPUs