Two key things AI accelerator cards will need to provide: processing AND memory. I can foresee an accelerator offloading the need for graphics cards to share VRAM with LLM models. It's an exciting opportunity for card vendors and chip vendors... imagine an AMD or Nvidia AI card, with 24GB+ "IRAM" (Inference Memory), leaving video cards to leverage their AI processing to rendering duties, while the AIA handles the AI for characters (actions and dialog) as well as interpreting inputs speech from the player. Of course, there is much more it could do, I'm just knocking the big obvious things out there now.
@@PySnek At the cost of performance and precious VRAM? Again, it's not a bad thing that GPUs have AI support (makes sense for DLSS/XSSS/FSR), but why should a game's render needs compete with an LLM taking up space and processing? A good AI accelerator would probably only need 4 PCI-e lanes and enough memory to host whatever models may be thrown at it. You could drop it in with ANY GPU/CPU, and get an immediate benefit from games and apps that will leverage AI.
@@PySnek Nvidia is far better off creating a whole new category of PC add-ons. An AI card won't cannibalize GPU sales, it will add to Nvidia's bottom line by introducing another thing people will purchase for their PCs, and improve the performance of their GPUs in games, since the AI won't be leeching performance. This isn't rocket science, it simple opportunity. LLMs have NOTHING TO DO with graphics. The only reason we are using GPUs to process LLM models is because of the tensor processing on GPUs - and in the process, STEALING PERFORMANCE. Separation is logical and inevitable.
and heavily nerfed the application... seriously if you ever tried a ral physx card vs what nvidia released even to date, nothing compares to the original demos with the hardware physx, it was amazing
@@hentosamaProblem is, not many are using PhysX in games because they’re not cross-compatible with other GPUs, and something like UE5 has physics simulators that are frankly good enough for most people and that are easy to implement across an entire range of computers rather than just NVIDIA GPUs. PhysX is still used in non-gaming applications, though.
Wow, last time I heard accelerators, it was for Math co-processor, then 3dfx, then PhysX, now we have AI accelerators, what a massive leap in the space of 25 years.
It's just code for enterprise gear. Seagate WarpDrive? Flash accelerator. Headless Quadro/Tesla/Radeon/Arc GPU? Graphics accelerator. nVidia Data processor? General purpose network/storage/service accelerator. Those cool encoder cards that can do like x264 very slow but LIVE? Specialized accelerator. There's a bit of AI potential in most of these and a lot of it is lost on the public because nobody notices any of it in the traditional desktop space. Also for what it's worth, GPUs are a co-processor.
Yeah except PhysX was a proper card with power draw, cooling and drivers. This just screams snake oil. This looks like nothing more than memory chips and a controller on a M.2 board. There isn't even a heat sink for cooling like there is for an NVME.
I am very excited about this. Anything that can accelerate local AI applications like Stable Diffusion and LLMs will be a game changer. It they can help accelerate and offload tasks from the GPU, these are going to sell. I am wondering if they will eventually allow for parallel processing or dedicated VRAM modals purely with these (to be truly dedicated offloading) as it would be nice to not even required a GPU that has CUDA (or other tech) or use large VRAM sizes.
They literally had Stable Diffusion running on the right monitor, slower than my laptop that doesn't even have a GPU. Without memory, these things are 100% trash.
@@jonmichaelgalindo First gen tech usually is. I'm hoping they can parallelize them and add their own memory to improve it's performance. This is something I'm sure they will eventually do.
@@CNC-Time-Lapse I don't think that would happen, these are just niche products for visual recognition. Stable Diffusion can run marginally good on almost any GPU. LLM models are the real problem, where it's ridiculously expensive to run them if you don't have your own hardware. You either have to wait 200 seconds or more if you run them on a CPU, or you have to get 3090... better 2 of them to run a good model and not wait a few minutes.
The actual problem with consumer GPUs, there is not enough VRAM, not the power of the GPU, so adding more tflops does not help at all. People add more GPUs because they don't have enough VRAM in 1 or 2 or even 3 GPUs, if RTX 4090 was with 48GB VRAM it would have been much better for AI acceleration, than it is now.
For training large models that's definitely the case, but for running inference with smaller, more efficient models you don't necessarily need the same capacity as large GPUs. Most of the near-term uses cases for _average_ consumers running local LLMs will be for things like chat assistants that can schedule tasks, summarize emails, generate images, etc. You don't need a ton of VRAM for that. A lot of the popular AI services right now are running in the cloud, anyway; the things that are going to run locally will be more for privacy with limited training (meaning smaller datasets/fewer parameters), requiring smaller buffer. The home enthusiasts of course will be driven to buy dedicated GPUs with more VRAM, but that's not really the primary market for these types of machines with these cards.
I think it was Asus that demoed a lower-end GPU that had an m.2 slot meant for an SSD. The thinking is that the card isn’t making full use of those pci-e lanes, so why not add in some system storage that can?
@@auturgicflosculator2183 Paying 3x more for 2x more VRAM is not a good investment. The problem is, the moment you go above VRAM you get from 10 to 20 times less performance. The most efficient AI GPU at the moment is RTX 3090 second hand. Of course if you have unlimited budget you go for the big Nvidia AI accelerators like A100 and H100, but these are totally outside of consumer space. AI accelerator without big chunk of VRAM is basically useless. These "accelerators" shown in the video are niche products catering to people who don't know what AI accelerators they actually need. They certainly don't need these m.2 accelerators, they would be very frustrated when they buy these and hoping they can run Stable Diffusion or LLM models on them. I see even in this comment section people don't understand what these things are and think they can run LLM models on these.... nah, without VRAM these are useless. Efficient models just run on less VRAM not without VRAM, these "accelerators" don't have any VRAM, going through 4x PCIe to the regular RAM will be extremely slow.
it's strange how the author compares to 3dfx. there is so many things wrong with that anology. for one no one knew how much demand their would be for graphics card to play games. no one knew (in the mainstream) how computing will take over. whereas everyone knows how important and how big AI will be and thus how lucrative it would be. secondly the other big thing wrong is how powerful Nvidia are and how they will make sure they keep their ai crown. the only reason they are valued at that level is because of demand for AI. so a startup that has 30 million or whatever isn't competing with Nvidia. they already working out into next 5-10 years of this will pan out. they already have an advantage with their own tech to guide them. so no. Intel, AMD, Nvidia are already making sure to get that piece of the pie
@@percy9228 You make good points. Nvidia actually learned about their gaming cards being used for computing way back. That's why they built CUDA a decade ago. The point with startups in this field is that they can't survive 5 years, let alone 10 years without continued funding, which is difficult. That's why their best bet is to get noticed by Intel or AMD and get acquired in my opinion.
@@percy9228 I sincerely concur with your assessment. However, there are exceptions: the "dominant tech" may come out of opensource-based AI tech startups that won't survive even 3 years.
I would want something like this to bypass vram requirements of very large models, but apparently this chip only takes 4x10MB of model size, which unfortunately makes its uses extremely limited if it doesn't actually let you run stupidly large models... :(
@@Slav4o911 It's really weird, though. If this thing actually performs like 40 teraflops and you can't use it on large models, it would run literally anything it can run at like 1000 fps. Thus I am inclined to think the 10MB might just be more like Cache instead of RAM.
@@Alice_Fumo It doesn't matter how fast it is, a GPU is faster than this yet when there's not enough VRAM the model runs slow. These things are like GPUs without any VRAM, which makes their actual usability extremely limited.
@@Slav4o911 I would disagree on the usability argument. It just needs to outperform a CPU by something like 6x. Not many people run models in CPU mode, but for example mixtral on a ryzen 5600x is almost usable. Same for whisperv3.
@@Alice_Fumo I don't think 3 min waiting for a single answer is "usable". Everything above 30 seconds is too much. Yes you can run the model... but that does not help much, you can run all GGUF models on any relatively new CPU... but the waiting time gets progressively long. Also from the discussion on the forums it seems the bottleneck is the memory bandwidth and no so much the raw power of the CPU. Also most models have specific optimizations for Nvidia hardware and are not well optimized to run on the CPU or other GPUs. That's why RTX 3060 is as fast as 7800XT in Stable Diffusion and is much faster in LLMs (if you can compile and run them at all), almost nobody runs LLMs on Radeon graphics cards, because they have to know a lot about programming and python code otherwise they might not even be able to run the model. In the consumer space, at least for now only Nvidia is viable for AI.
Coral TPU from like 2019 had something like 5 TOPS and was under $100, though that may have been more of an ASIC because every time i saw it referenced it was about image recognition, though up until chatGPT that was mostly what people were using AI for. I used photoprism for the same thing identify, organize and search my photos.
They came down in price a lot. The M.2 version (PCIe, as this guy refers to it) is dramatically more compact and only $25. One with a pair of those chips (so same performance as the one he's holding) is only $40 now.
Coral support seems to be stopped like 3 years ago. Coral chip has 8MB memory which limits the models size significantly, and they operate in 8-bit integers only. Coral dual requires 2 PCIe lanes which most mother boards don't provide (so you'll be able to use one TPU out of two).
@@Archie3D Most M.2 slots are 2-4, but yes, while a bifurcation card would physically mount 4 of the TPUs for 8 total, you could only address one of them because most board do not support 2x bifurcation and only 4x4x4x4x or 8x8x, some 8x4x4x
I can remember that "AI accelerator" cards are available for a while now. Even in M.2 format. Maybe the difference is that they have added windows support instead of that it just works on linux.
This. This is not new by any stretch. Just that what we have are ungodly expensive and different form factors.. Guess someone has to play on the craze though
@ts757arse I clearly meant "craze" as in capitalizing on the buzzword. This hardware isn't exactly new. I'm not even sure the form factor is totally new, considering you can find M.2.... annything, if you need it. I was looking into dedicated accelerators before, and AMD has the Alveo I think their called. Thats the obvious one. Everything else are custom solutions. Add how everyone's launching CPUs with dedicated AI hardware, and honestly I have to wonder where these really fit in overall outside of special cases and "extra because why not." I guess there are also the folk who simply refuse to update hardware overall but happen to have an open slot, but even then, if their serious about this, step into the future already instead of 1/4 stepping. I can't imagine this would be useful without at least 64 gigs of system ram.
Cool! They are showing us something that an NPU can be used for! I think I like the idea of having that be a separate device and not be taking up space inside of a CPU.
@@robertlawrence9000 I guess it depends on scale, most AI-designed gpus use HBM and tons of VRAM where something like "casual" 24GB GDDR6X is uselessly slow and inefficient, I can't really answer why do we need AI units in CPU, but smartphone manufacturers have been doing that for years and Apple now does that in therit Macs and it seems to be fairly efficient
TPUs are optimized for specific machine learning algorithms and use cases, while MPU AI accelerators can handle a wider range of AI workloads, but may not be as efficient at specific tasks as TPUs. But I guess the key difference is that TPUs are proprietary hardware developed and used primarily by Google, while MPU AI accelerators are available from a variety of vendors and can be used on any computer with an MPU processor. Another problem is that TPU are mostly supported for Linux systems. It is also true that some MPU AI accelerators do have dedicated memory specifically designed to handle AI workloads. This "on-chip memory" or "local memory" is designed to store the data and intermediate results of the AI algorithms being processed by the accelerator. This allows the accelerator to access the data it needs quickly, without having to rely on the main system memory.
@@smorrow MPU is short for Main Processing Unit, NPU means Neural Processing Unit. Those are different kinds of accelerators. MPU is for multiple tasks. NPU is spezialised on certain tasks in NN or ANN.
Wow, awesome coverage! I had no idea this technology was unveiled at CES this year, nobody else seems to have covered it. Thanks for shedding some light on this, I truly appreciate it. Stuff like this makes me really excited, I would love to have an AI accelerator card in my PC!
These are going to be important. I'm glad all the system builds I've done for myself, an Internet Cafe along with a school have 2 NVMe ports, and all the better quality boards I bought have 3 (all 500 series AM4, either B550(M) or X570), so those 3 NVMe port boards can run dual OS and STILL have an NPU. It won't matter for the next couple years, but 3+ years it will.
This makes sense I guess for adding AI specific compute power, but I still fail to see or understand how "AI" is any different than any other program or algorithm. If a code base is sufficiently large with enough options and variables, then it's just a matter of picking/placing the correct variable with the correct option, just like any program always has. If it's really actually any different then that, well then I guess I have a lot to learn. Cheers 🍻
It’s more that you can create a chip that’s optimized for a specific algorithm. Same way that you use a GPU and a CPU for different operations. You COULD use a CPU to perform graphics operations but it would be very slow.
Well just like graphics can also be run on any cpu, ai can also be run on any cpu. But that doesn’t mean it’s gonna be ideal. CPUs are good as general purpose processors, but since their internals need to be generic enough to be used for anything, they will never be as efficient or optimized for specialized task that other pieces of hardware are built for. Gpus started becoming a thing because loads of parallelized vector maths require a focus different kinds of instructions and overall processor layout to run better. And it just so happened to be that gpus were better suited for ai than cpus out of the box. However, gpus still contain a lot of stuff that ai doesn’t really need, hence why specialized ai accelerator cards can make sense. Whenever or not it will be big enough of a difference to warrant different hardware vs just keeping it on a gpu, only time will tell. Similar developments happened with raytracing and before that physx, and those all just incorporated into the gpus afterall.
My layperson explanation: Artificial Intelligence/Machine Learning is largely different from traditional computing algorithms in that it is self-training and stochastic, meaning that it can at least approximate some level of human-like inference. In other words, AI can often (not always - "hallucinations" being a more visible failure) successfully carry out a task in different ways without being deterministically programmed with those decision routes. This is why something like Stable Diffusion or ChatGPT isn't going to necessarily spit out the same image or answer every time, even if given identical prompts: someone didn't program each and every output - the AI model is arriving at those different results on its own. The reason why you might want dedicated AI hardware is similar to the reason why dedicated graphics were developed even though CPUs can technically do the same functions: hardware that is purpose-built tends to be more efficient and will have resources free just to do the tasks they are given. AI/ML like Large Language Models (LLMs) rely on matrix multiplication. GPUs happen to be pretty good at this - this is why Nvidia is ruling the roost right now in terms of AI hardware - but a GPU might be busy with other operations, especially if it's a low-end GPU in a power constrained laptop or a all-in-one desktop with only integrated graphics. CPUs can also perform AI functions, but even less efficiently than either GPU or dedicated AI hardware might (some locally running AI software will let you run on CPU but it can be painfully slow to do that). Right now it's not a big deal to just run AI tasks on GPU, but given a very near future where video games, streaming software, local assistants, etc might all be competing for resources on desktops, it makes sense to at least start off exploring dedicated AI hardware as an accelerator. We have seen some dedicated accelerators pretty much go nowhere in the past - PhysX comes to mind - but AI/ML does have a lot of burgeoning use cases, even for home users.
From my non-expert knowledge (anyone correct me if I am wrong). In terms of these machine learning algorithms is that they are a lot easier to implement in a way than writing a specific algorithms. Sometimes it is nigh impossible (like writing a deterministic more "traditional" algorithm for a large language model). You, more or less, give training data, and the model gets better and better over time at mimicking the desired results. Training a ML (machine learning) algorithm to do basic math is sort of worthless, since we already know how to do it in computer science and math. Determining the sentiment of a review and extracting key words or generating an aesthetic image based on a prompt? Much more difficult. ML is essentially a general brute force sort of method of solving certain kinds of problems that are either too difficult or costly to do in a different way. Also to note their are a variety of different ML techniques as well, which are better for different kinds of tasks. LLMs (large language models, like ChatGPT) are increasingly popular and focused on because their ability to do a lot of language understanding based tasks fairly well. And with some newer models being also able to make API calls and execute code, can do in theory a lot. It is the ultimate generalist. This is sort of why people were and still are excited for Boston Dynamics robots. There are better specific ones, but a generalist one adds a lot of flexibility, since hopefully they can do most tasks that humans can (in the physical sense). Dedicated NPUs could maybe be useful for keeping costs down in building a system that needs to do ML computations without having to buy a more expensive set up, or in something that needs to be energy efficient (drones or RC like stuff). They are targeting a small niche with it in desktops, IMO, at least currently, because if you were wanting to do AI ML stuff, you would buy either a GPU or just run it off a CPU slowly with more RAM. It seems, at least currently a weird middle ground, which isn't even that middle (more so like running on a CPU than a GPU with large VRAM).
@@cxsey8587: Yeah, exactly.... Or rather that's what I'm thinking. No different than a CPU vs ARM vs GPU vs APU, it's all just specific hardware and the specific code that drives it, same as it ever was. To me, "AI" just sounds like a new term for some old common everyday tasks. Just like "The Cloud" is a server and storage the same as it ever was, but Microsoft gave those two components a new name and now it's something special.!? Cheers 🍻
Intel has a stand-alone NPU on a PCIe M.2 card already, but it's not something you can easily just buy. I think the best application for these NPUs will be to upgrade older PCs lacking an NPU, or eliminating expensive GPUs from machines focused solely on AI.
You can't do that because there is not enough bandwidth from the PCIe bus to the RAM. GPUs has plenty power but the moment you offload part of the model from VRAM to RAM, there is a very big slowdown. I'm running LLM models and the moment they go outside of VRAM they become progressively slower, even if only for example 20% is "outside" of VRAM. For example the same model takes 10 seconds if fully inside VRAM and 20 seconds if partially outside VRAM... if it's 50% outside it's 60 seconds... and so on... working only on the CPU it's 200 seconds...
not for long the ai accelerate card will expand in performance and also size so that mean in the future we might have an ai card in the size of a graphic card
"We're looking at the Lenovo ThinkPad, which is a desktop"... No, nope, you're not looking at a ThinkPad which is a desktop... You're looking at a ThinkCentre... which is a desktop... A ThinkPad is always a mobile device...
The first gen of these, probably nothing. But that was also true of early SSDs with miniscule capacities and high prices. In the future, though, they might just be another add-in card to handle AI compute loads more efficiently. Right now most useful AI is prosumer level, but as hardware proliferates then we should see more software proliferate to use it. Chicken-and-egg. But it's early days, otherwise.
An ideal AI accelerator for me would generate more tokens per second in large language models (Llamma 2, Mistral, etc.) and faster image generation with Automatic111/Stable Diffusion. If it had enough RAM/Power it could also be used to train or fine tune AI models.
Feels like a solution looking for a problem at the moment. I could see these things being used to enable some really crazy procedural experiences in games, but it feels like that could be done with the tensor cores on a GPU as-is.
This kind of product already exist in China market for years... First major product is release in around 2019 I think, and provided 8TOPS of AI computing performance.
The big problem with AI accelerators is that you need a ton of VRAM for useful AI (I have 24GB GPU and it's barely enough); which is likely the limiting factor in terms of price. Though this might have a market for Raspberry Pi kind of AI stuff.
TPUs are optimized for specific machine learning algorithms and use cases, while MPU AI accelerators can handle a wider range of AI workloads, but may not be as efficient at specific tasks as TPUs. But I guess the key difference is that TPUs are proprietary hardware developed and used primarily by Google, while MPU AI accelerators are available from a variety of vendors and can be used on any computer with an MPU processor. Another problem is that TPU are mostly supported for Linux systems. It is also true that some MPU AI accelerators do have dedicated memory specifically designed to handle AI workloads. This "on-chip memory" or "local memory" is designed to store the data and intermediate results of the AI algorithms being processed by the accelerator. This allows the accelerator to access the data it needs quickly, without having to rely on the main system memory.
I would call it the "next step in evolution" after SSDs and Ray Tracing , not the "next 3Dfx". By the way, I gamed on my 3Dfx Voodoo 2 just this week 🙂
Sorry, so you need a dedicated motherboard to plug this thing into? Will they work in tandem? Could we externalize these and jack them through a dedicated port?
Are there more details about the performance with Stable Diffusion you can see on the right-hand screen? An accelerator for Stable Diffusion or LLaMa would be very interesting!
This would be useful with multi m.2 adapter cards, assuming it would be on system ram so ddr level would have to be high, improvements for the cards would multiply result, alternatively currently 300+ flops on the GPUS, but potentially loading/unloading of models would be quicker if split over multiple pci slots presumably. Possibly add options for non-ROCM AMD owners, though would have to research compatibility with torch. Note, it would be the next PowerVR card, because 3DFX were cards requiring cabling, while PowerVR with no ports.
Funny that the Lenovo booth PC has the "Activate Windows" message on the screen. Come on Lenovo....aren't you going to pay for that copy of Windows you're using to demo?
TFLOPs, is float point operations, 16 bit or 32 bit floats. TOPS is usually INT8, because for inference you usually can go with INT8. Training AI is another beer, and needs floats. So basically all players market TOPS, because people will use already trained AI models, and INT8 is much more energy and space efficient.
Seems like the second is used to generate images with stable diffusion and you can easily compare its performance with the performance of the current GPUs in Stable Diffusion
MemryX's product is based on their own MX3 chip, which can handle 10 million parameters per chip, 8 chips per M.2 package, that's still 88 M.2 devices (350 PCIe lanes) to address at once for a wimpy 7 billion parameter language model. In an ideal world, these chips would be able to address their own GDDR6x pools per-chip, but I don't know their architecture.
Unless the desktop doesn't have a dedicated GPU, I don't see their long term usefulness. AMD just announced a desktop APU with integrated NPU, Intel will have something for desktop with Arrow Lake. Even without them Nvidia will tell you every desktop with an RTX 30 onwards has on device AI already. On desktop we don't really care for efficiency as we are plugged in.
Yeah what can be integrated into CPUs will be better than this. The new Snapdragon chips are probably stronger and the new androids will all have it by the end of the year.
@@marshallmcluhan33 What can be integrated into CPUs may be less upgradeable. How rapidly are we expecting this technology to evolve? Old school CPU technology is not likely to evolve anywhere near as fast. And on the GPU front, I don't see these NPU cards replacing GPUs anytime soon, if ever. GPUs are already heavily optimized for paralleled operations and are generally being marketed as AI cards anyway. NPUs are more likely to replace classical CPUs for operations that a machine learning algorithm can optimize for parallel computing better than an intern seeking a degree in computer programming. So they are more likely to replace the CPU, than the GPU.
@@marvinmallette6795 Well there are lots of software optimizations that are still being done, quantization can help AI models run on legacy hardware for example. In terms of hardware Windows is doing a push towards ARM chips. Chiplets and RISC-V are things that could unlock better hybrid CPUs. Also we might see a new AVX instruction set made specifically for popular AI applications. There's a large change coming to laptop and desktop CPUs.
How do they solve for memory bandwidth? I’d gladly buy an ai accelerator card if it has access to high memory bandwidth. The more the better. If they can make a product with access to 128gb memory and 60 tops then I’m down to buy it. Need to have good software support too.
I don't really see the point in these. People are saying "great, now my graphics card is free for my games while I run AI", but in actuality you're still competing for the same VRAM, in $$$ if nothing else. Until there is actually a dominance of games that actually use AI while running, I can't imagine most people would be doing heavy AI generation while tying their graphics cards up to running a game on the same platform. It'd make more sense to have two separate computers if that was the case. Seems like they have to find a niche to fill, maybe some commercial application that wouldn't call for a graphics card normally, but has a need for AI.
There's something for desktop but it's build in and it's on the Apple Silicon isn't it? Is this module one going to able to scale with more you install in and does any CPU able to talk with them directly?
Is it really the same kind of FP16/FP32 5-10 TFlops per card that can be generally accessible with OpenCL as with AMD/Nvidia gpu's? So far I am dubious of the claims. It also doesn't seem to have any memory outside of the on-die 10MB, so you can run only shitty tiny models that can be easily handled by CPU anyway.
The big problem for desktops is not ai computing power, but enough ram with a high enough bandwidth to gpu and/or the cpu or a dedicated ai chip to load the models into. Just for perspective, a 4090 with 24gig of ram has too small a vram for many llms. Any serious ai accelerator would need to have huge amounts of fast ram to outcompete a gpu. And again, todays gpus are severely vram size limited to what we need for decent performing llms sich as mixtral, which in its original size needs about 48gb. Even a 1080ti could run mixtral okish when it comes to speed, if it had enough vram... I highly doubt these accelerators will be able to run any decent llm. The only one so far going the right way is apple with its m2 192gb ram, 800gb/sec bandwidth hardware. Great for inference of big models. Not the fastest, because its an m2, but outperforms any consumer hardware because of ram size and bandwidth. A 4090 would be much faster, but cannot fit big models in its ram. All amd and nvidia would have to do is give consumer gpus huge vram increases. But they probably wont, because this would make their server gpus less attractive.
New AI accelerator cards are aiming to make a significant impact in the tech industry, reminiscent of the game-changing influence 3dfx had in graphics processing. Designed to enhance AI performance and efficiency, these cards could revolutionize machine learning tasks and become pivotal in driving future innovations.
People are misunderstanding what this card is doing. This is like an NPU (or Apple neural engine) which runs matrix multiplications hardware accelerated with efficiency in mind. Anyways, if you want to do anything ML related right now, the best option is to get a Mac since the Apple silicon Macs have unified memory which serves the purpose of both RAM and VRAM. I can run 70b models on my work 64GB M2 Max MBP 14" which is just bonkers. Sure that thing costs $3700 if bought new from Apple directly. If you want the absolute best, you can get an M3 Max 128GB unified memory MBP 14" for $4800 which is steep but with 128GB VRAM you can run 120b+ models like Goliath (Goliath takes up 70GB VRAM, and Goliath is amazing). I can see it as a long term investment and if you are a developer, buying a Mac studio Ultra or a MBP with a ton of memory makes sense if you are thinking 5+ years in the future. Macs go up to 192 GB for the ultra btw. The inference speed is also quite nice. My RTX 3070 only has 8GB VRAM for example, but my personal M2 MBA 24GB has memory equivalent to a 4090 (which I got for $1600 on ebay). Also considering how efficient Apple silicon is, you are going to save a lot of money by running your inferences and training on a Mac in terms of energy costs.
A low power but high memory GPU like Card specialized for ml, like s tpu, but with proper torch support and like 32gb of fast memory would be really interesting.
Will these boost VRAM for AI applications or only processing speed? Love to see these have like 4,8,12GB of VRAM that can combine with your main GPU. And also boost some speed. Be great if they could be used for like Stable Diffusion but also Local LLMs.
I'd love to just have a compression/decompression add on card that is 50x faster than a "hot" multicore CPU. Something I could hand a 1/2TB VM file to and have it squeezed down in 2 minutes.
So... if you only want A.I. stuff like Stable Diffusion or image background remover, etc, and you don't play3D games, you won't need to buy an expensive GPU any more? How about the memory? GPU's cannot use system memory and must use its own VRAM. Do these A.I. accelerator card have their own RAM or do they use system RAM?
I was kind of under the impression Ai had space to learn and grow in size. So how is this pci accelerator card artificially intelligent? I'm not allocating hdd/sdd space for it.
If these dedicated NPU's are compelling enough, I imagine nvidia will just add them in to their GPU's rather than us having to buy yet another dedicated additional card.
Seeing as M.2 NPUs are on display here, I would expect significantly less power. It might be intended as an upgrade path forward for "legacy" x86 IBM/ATX computers. GPUs may be less marketable as an upgrade, being expensive, loud/thermally demanding, and having a reputation for being for "gamers" and not general purpose office work. I'm seeing them more likely having an advantage over x86 Intel CPUs, using legacy x86 code which is centered around ordered instructions across a limited number of cores. NPUs would provide a larger "core count" for heavily paralleled workloads that could be executed out of order with the oversight of an AI engine.
PC Building experts will likely need to review the hardware to ensure there is enough PCI-Express bandwidth to ensure proper Operating System stability. It seems like just "any PC" would be likely to suffer crashes and reboots due to SSD disconnects should the shared PCI-Express bus experience excessive congestion. It does look from the thumbnail that such is the notion, separate M.2 hardware compatible with any PC, that has the available shared PCI-Express bandwidth... Probably best not to pair it with an NVIDIA Geforce or AMD Radeon Graphics Processor.
Imagine having a last gen pc that can't run new unreal 5 games smoothly and buying one of these pretty much makes it a viable solution, instead of a whole expensive pc overhaul.
This topic has not received the attention that it really should have, M.2 dedicated AI AICs mean ANY pc can run an NPU at relatively low upgrade cost. These need to be priced accordingly and widely available, like SSDs.
3dFX - dang, that takes
me back 20+ years! I had two running in SLI. Played Quake like no other! Great video!
Same over here, Great times
I had 2 of the 12MB Voodoo 2's. Quake and Quake 2 at 2048 x 1024 on a 17" 4:3 CRT. Mmmm so good. 😙👌
Lol
Not possible for glide 3d FX driver with sli
Max was using 2 sli cards 1024x768 @@thelaughingmanofficial
@@thelaughingmanofficial I had a friend with a similar rig. All the anti-aliasing in the world couldn't match just using more pixels back then.
2048x1024 huh? Lol.@thelaughingmanofficial
They want to decline and be bought on the cheap by Nvidia?
This
Lenovo is a Chinese co & China is seeing countries limiting its supply of ai hardware..
Lol yeah not sure why you would want to be the next 3Dfx and only exist for a brief time and then become history. 😂
Thought exact same.
3Dfx failed because of a bad investment desicion at a bad time, it was poor management and bad luck.
Two key things AI accelerator cards will need to provide: processing AND memory. I can foresee an accelerator offloading the need for graphics cards to share VRAM with LLM models. It's an exciting opportunity for card vendors and chip vendors... imagine an AMD or Nvidia AI card, with 24GB+ "IRAM" (Inference Memory), leaving video cards to leverage their AI processing to rendering duties, while the AIA handles the AI for characters (actions and dialog) as well as interpreting inputs speech from the player. Of course, there is much more it could do, I'm just knocking the big obvious things out there now.
They will just integrate it to existing GPUs.
@@PySnek At the cost of performance and precious VRAM? Again, it's not a bad thing that GPUs have AI support (makes sense for DLSS/XSSS/FSR), but why should a game's render needs compete with an LLM taking up space and processing? A good AI accelerator would probably only need 4 PCI-e lanes and enough memory to host whatever models may be thrown at it. You could drop it in with ANY GPU/CPU, and get an immediate benefit from games and apps that will leverage AI.
@@wtflolomg why? because nvidia wants the whole cake and has enough ressources and brain to reach that goal
@@PySnek Nvidia is far better off creating a whole new category of PC add-ons. An AI card won't cannibalize GPU sales, it will add to Nvidia's bottom line by introducing another thing people will purchase for their PCs, and improve the performance of their GPUs in games, since the AI won't be leeching performance. This isn't rocket science, it simple opportunity. LLMs have NOTHING TO DO with graphics. The only reason we are using GPUs to process LLM models is because of the tensor processing on GPUs - and in the process, STEALING PERFORMANCE. Separation is logical and inevitable.
Il rather call it AIRAM, just limiting it to inference and not training as well is a bit...well limiting.
This feels a lot like what Ageia did for PhysX before Nvidia bought them and incorporated them into their GTX Geforce Line.
and heavily nerfed the application... seriously if you ever tried a ral physx card vs what nvidia released even to date, nothing compares to the original demos with the hardware physx, it was amazing
just not true lmao even gpus just a generation or two after that far outstripped the PhysX cards@@hentosama
@@hentosamaProblem is, not many are using PhysX in games because they’re not cross-compatible with other GPUs, and something like UE5 has physics simulators that are frankly good enough for most people and that are easy to implement across an entire range of computers rather than just NVIDIA GPUs. PhysX is still used in non-gaming applications, though.
Wow, last time I heard accelerators, it was for Math co-processor, then 3dfx, then PhysX, now we have AI accelerators, what a massive leap in the space of 25 years.
The march of technology never ceases to amaze.
And two of those were bought and absorbed by Nvidia so I'm looking forward to this happening again
Still a math co-processor, just that instead of just floating point it focuses on linear algebra
It's just code for enterprise gear.
Seagate WarpDrive? Flash accelerator.
Headless Quadro/Tesla/Radeon/Arc GPU? Graphics accelerator.
nVidia Data processor? General purpose network/storage/service accelerator.
Those cool encoder cards that can do like x264 very slow but LIVE? Specialized accelerator.
There's a bit of AI potential in most of these and a lot of it is lost on the public because nobody notices any of it in the traditional desktop space. Also for what it's worth, GPUs are a co-processor.
Yeah except PhysX was a proper card with power draw, cooling and drivers. This just screams snake oil. This looks like nothing more than memory chips and a controller on a M.2 board. There isn't even a heat sink for cooling like there is for an NVME.
I am very excited about this. Anything that can accelerate local AI applications like Stable Diffusion and LLMs will be a game changer. It they can help accelerate and offload tasks from the GPU, these are going to sell. I am wondering if they will eventually allow for parallel processing or dedicated VRAM modals purely with these (to be truly dedicated offloading) as it would be nice to not even required a GPU that has CUDA (or other tech) or use large VRAM sizes.
Sadly these are not for that. Without onboard VRAM they can't accelerate Stable Diffusion or LLM models.
They literally had Stable Diffusion running on the right monitor, slower than my laptop that doesn't even have a GPU. Without memory, these things are 100% trash.
@@jonmichaelgalindo First gen tech usually is. I'm hoping they can parallelize them and add their own memory to improve it's performance. This is something I'm sure they will eventually do.
@@CNC-Time-Lapse I don't think that would happen, these are just niche products for visual recognition. Stable Diffusion can run marginally good on almost any GPU. LLM models are the real problem, where it's ridiculously expensive to run them if you don't have your own hardware. You either have to wait 200 seconds or more if you run them on a CPU, or you have to get 3090... better 2 of them to run a good model and not wait a few minutes.
Imagine you had a gpu design which had a spare m.2 slot on it which you could add an ai accelerator like those onto. Like a physx addin card.
The actual problem with consumer GPUs, there is not enough VRAM, not the power of the GPU, so adding more tflops does not help at all. People add more GPUs because they don't have enough VRAM in 1 or 2 or even 3 GPUs, if RTX 4090 was with 48GB VRAM it would have been much better for AI acceleration, than it is now.
For training large models that's definitely the case, but for running inference with smaller, more efficient models you don't necessarily need the same capacity as large GPUs. Most of the near-term uses cases for _average_ consumers running local LLMs will be for things like chat assistants that can schedule tasks, summarize emails, generate images, etc. You don't need a ton of VRAM for that.
A lot of the popular AI services right now are running in the cloud, anyway; the things that are going to run locally will be more for privacy with limited training (meaning smaller datasets/fewer parameters), requiring smaller buffer. The home enthusiasts of course will be driven to buy dedicated GPUs with more VRAM, but that's not really the primary market for these types of machines with these cards.
@@Slav4o911 A6000 and W7900 both have 48GB of VRAM... if you feel like paying 2-3 times as much for a GPU. 😄
I think it was Asus that demoed a lower-end GPU that had an m.2 slot meant for an SSD. The thinking is that the card isn’t making full use of those pci-e lanes, so why not add in some system storage that can?
@@auturgicflosculator2183 Paying 3x more for 2x more VRAM is not a good investment. The problem is, the moment you go above VRAM you get from 10 to 20 times less performance. The most efficient AI GPU at the moment is RTX 3090 second hand. Of course if you have unlimited budget you go for the big Nvidia AI accelerators like A100 and H100, but these are totally outside of consumer space.
AI accelerator without big chunk of VRAM is basically useless. These "accelerators" shown in the video are niche products catering to people who don't know what AI accelerators they actually need. They certainly don't need these m.2 accelerators, they would be very frustrated when they buy these and hoping they can run Stable Diffusion or LLM models on them. I see even in this comment section people don't understand what these things are and think they can run LLM models on these.... nah, without VRAM these are useless. Efficient models just run on less VRAM not without VRAM, these "accelerators" don't have any VRAM, going through 4x PCIe to the regular RAM will be extremely slow.
These "AI Startups" all aim for one thing: to get acquired - otherwise file for Chapter 11!
it's strange how the author compares to 3dfx. there is so many things wrong with that anology.
for one no one knew how much demand their would be for graphics card to play games.
no one knew (in the mainstream) how computing will take over.
whereas everyone knows how important and how big AI will be and thus how lucrative it would be.
secondly the other big thing wrong is how powerful Nvidia are and how they will make sure they keep their ai crown.
the only reason they are valued at that level is because of demand for AI. so a startup that has 30 million or whatever isn't competing with Nvidia. they already working out into next 5-10 years of this will pan out.
they already have an advantage with their own tech to guide them.
so no. Intel, AMD, Nvidia are already making sure to get that piece of the pie
@@percy9228 You make good points. Nvidia actually learned about their gaming cards being used for computing way back. That's why they built CUDA a decade ago. The point with startups in this field is that they can't survive 5 years, let alone 10 years without continued funding, which is difficult. That's why their best bet is to get noticed by Intel or AMD and get acquired in my opinion.
@@percy9228 I sincerely concur with your assessment. However, there are exceptions: the "dominant tech" may come out of opensource-based AI tech startups that won't survive even 3 years.
I would want something like this to bypass vram requirements of very large models, but apparently this chip only takes 4x10MB of model size, which unfortunately makes its uses extremely limited if it doesn't actually let you run stupidly large models... :(
It can't run even small models, this AI add on card is not for running LLMs.
@@Slav4o911 It's really weird, though. If this thing actually performs like 40 teraflops and you can't use it on large models, it would run literally anything it can run at like 1000 fps. Thus I am inclined to think the 10MB might just be more like Cache instead of RAM.
@@Alice_Fumo It doesn't matter how fast it is, a GPU is faster than this yet when there's not enough VRAM the model runs slow. These things are like GPUs without any VRAM, which makes their actual usability extremely limited.
@@Slav4o911 I would disagree on the usability argument. It just needs to outperform a CPU by something like 6x.
Not many people run models in CPU mode, but for example mixtral on a ryzen 5600x is almost usable. Same for whisperv3.
@@Alice_Fumo I don't think 3 min waiting for a single answer is "usable". Everything above 30 seconds is too much. Yes you can run the model... but that does not help much, you can run all GGUF models on any relatively new CPU... but the waiting time gets progressively long. Also from the discussion on the forums it seems the bottleneck is the memory bandwidth and no so much the raw power of the CPU. Also most models have specific optimizations for Nvidia hardware and are not well optimized to run on the CPU or other GPUs. That's why RTX 3060 is as fast as 7800XT in Stable Diffusion and is much faster in LLMs (if you can compile and run them at all), almost nobody runs LLMs on Radeon graphics cards, because they have to know a lot about programming and python code otherwise they might not even be able to run the model. In the consumer space, at least for now only Nvidia is viable for AI.
Coral TPU from like 2019 had something like 5 TOPS and was under $100, though that may have been more of an ASIC because every time i saw it referenced it was about image recognition, though up until chatGPT that was mostly what people were using AI for.
I used photoprism for the same thing identify, organize and search my photos.
They came down in price a lot. The M.2 version (PCIe, as this guy refers to it) is dramatically more compact and only $25. One with a pair of those chips (so same performance as the one he's holding) is only $40 now.
The M.2 Dual coral one is $39 MSRP with 8 TOPs (4 per chip) Also its only 22 mm x 30 mm (M.2-2230-D3-E)!
Coral support seems to be stopped like 3 years ago. Coral chip has 8MB memory which limits the models size significantly, and they operate in 8-bit integers only. Coral dual requires 2 PCIe lanes which most mother boards don't provide (so you'll be able to use one TPU out of two).
@@Archie3D Most M.2 slots are 2-4, but yes, while a bifurcation card would physically mount 4 of the TPUs for 8 total, you could only address one of them because most board do not support 2x bifurcation and only 4x4x4x4x or 8x8x, some 8x4x4x
Notice NPU percentage wasn't shown. My guess is you don't use it much.
I can remember that "AI accelerator" cards are available for a while now. Even in M.2 format.
Maybe the difference is that they have added windows support instead of that it just works on linux.
This.
This is not new by any stretch. Just that what we have are ungodly expensive and different form factors..
Guess someone has to play on the craze though
@ts757arse I clearly meant "craze" as in capitalizing on the buzzword.
This hardware isn't exactly new. I'm not even sure the form factor is totally new, considering you can find M.2.... annything, if you need it.
I was looking into dedicated accelerators before, and AMD has the Alveo I think their called. Thats the obvious one. Everything else are custom solutions.
Add how everyone's launching CPUs with dedicated AI hardware, and honestly I have to wonder where these really fit in overall outside of special cases and "extra because why not." I guess there are also the folk who simply refuse to update hardware overall but happen to have an open slot, but even then, if their serious about this, step into the future already instead of 1/4 stepping. I can't imagine this would be useful without at least 64 gigs of system ram.
Cool! They are showing us something that an NPU can be used for! I think I like the idea of having that be a separate device and not be taking up space inside of a CPU.
M.2/PCIe (even 5.0) is relatively very slow though...
@@TazzSmk yeah but that doesn't hurt the GPUs with AI. Why do we need this in a CPU?
@@robertlawrence9000 bandwidth is very high on die
@@robertlawrence9000 I guess it depends on scale, most AI-designed gpus use HBM and tons of VRAM where something like "casual" 24GB GDDR6X is uselessly slow and inefficient,
I can't really answer why do we need AI units in CPU, but smartphone manufacturers have been doing that for years and Apple now does that in therit Macs and it seems to be fairly efficient
@@robertlawrence9000 NPU is faster at processing basic AI than either CPU or GPU, makes sense to have it located centrally.
Aren't the Coral AI cards the first "era" of AI accelerator cards?
thought so too, and they have a few different variants with different connector keying
TPUs are optimized for specific machine learning algorithms and use cases, while MPU AI accelerators can handle a wider range of AI workloads, but may not be as efficient at specific tasks as TPUs. But I guess the key difference is that TPUs are proprietary hardware developed and used primarily by Google, while MPU AI accelerators are available from a variety of vendors and can be used on any computer with an MPU processor. Another problem is that TPU are mostly supported for Linux systems.
It is also true that some MPU AI accelerators do have dedicated memory specifically designed to handle AI workloads. This "on-chip memory" or "local memory" is designed to store the data and intermediate results of the AI algorithms being processed by the accelerator. This allows the accelerator to access the data it needs quickly, without having to rely on the main system memory.
@@365tage7 Is "MPU" a new thing I haven't heard of or are you just consistently misspelling "NPU"?
@@smorrow MPU is short for Main Processing Unit, NPU means Neural Processing Unit. Those are different kinds of accelerators. MPU is for multiple tasks. NPU is spezialised on certain tasks in NN or ANN.
Wow, awesome coverage! I had no idea this technology was unveiled at CES this year, nobody else seems to have covered it. Thanks for shedding some light on this, I truly appreciate it. Stuff like this makes me really excited, I would love to have an AI accelerator card in my PC!
They wanna be the next PhysX?
These are going to be important. I'm glad all the system builds I've done for myself, an Internet Cafe along with a school have 2 NVMe ports, and all the better quality boards I bought have 3 (all 500 series AM4, either B550(M) or X570), so those 3 NVMe port boards can run dual OS and STILL have an NPU.
It won't matter for the next couple years, but 3+ years it will.
This makes sense I guess for adding AI specific compute power, but I still fail to see or understand how "AI" is any different than any other program or algorithm. If a code base is sufficiently large with enough options and variables, then it's just a matter of picking/placing the correct variable with the correct option, just like any program always has. If it's really actually any different then that, well then I guess I have a lot to learn.
Cheers 🍻
It’s more that you can create a chip that’s optimized for a specific algorithm. Same way that you use a GPU and a CPU for different operations. You COULD use a CPU to perform graphics operations but it would be very slow.
Well just like graphics can also be run on any cpu, ai can also be run on any cpu. But that doesn’t mean it’s gonna be ideal. CPUs are good as general purpose processors, but since their internals need to be generic enough to be used for anything, they will never be as efficient or optimized for specialized task that other pieces of hardware are built for. Gpus started becoming a thing because loads of parallelized vector maths require a focus different kinds of instructions and overall processor layout to run better. And it just so happened to be that gpus were better suited for ai than cpus out of the box. However, gpus still contain a lot of stuff that ai doesn’t really need, hence why specialized ai accelerator cards can make sense.
Whenever or not it will be big enough of a difference to warrant different hardware vs just keeping it on a gpu, only time will tell. Similar developments happened with raytracing and before that physx, and those all just incorporated into the gpus afterall.
My layperson explanation: Artificial Intelligence/Machine Learning is largely different from traditional computing algorithms in that it is self-training and stochastic, meaning that it can at least approximate some level of human-like inference. In other words, AI can often (not always - "hallucinations" being a more visible failure) successfully carry out a task in different ways without being deterministically programmed with those decision routes. This is why something like Stable Diffusion or ChatGPT isn't going to necessarily spit out the same image or answer every time, even if given identical prompts: someone didn't program each and every output - the AI model is arriving at those different results on its own.
The reason why you might want dedicated AI hardware is similar to the reason why dedicated graphics were developed even though CPUs can technically do the same functions: hardware that is purpose-built tends to be more efficient and will have resources free just to do the tasks they are given.
AI/ML like Large Language Models (LLMs) rely on matrix multiplication. GPUs happen to be pretty good at this - this is why Nvidia is ruling the roost right now in terms of AI hardware - but a GPU might be busy with other operations, especially if it's a low-end GPU in a power constrained laptop or a all-in-one desktop with only integrated graphics. CPUs can also perform AI functions, but even less efficiently than either GPU or dedicated AI hardware might (some locally running AI software will let you run on CPU but it can be painfully slow to do that).
Right now it's not a big deal to just run AI tasks on GPU, but given a very near future where video games, streaming software, local assistants, etc might all be competing for resources on desktops, it makes sense to at least start off exploring dedicated AI hardware as an accelerator. We have seen some dedicated accelerators pretty much go nowhere in the past - PhysX comes to mind - but AI/ML does have a lot of burgeoning use cases, even for home users.
From my non-expert knowledge (anyone correct me if I am wrong).
In terms of these machine learning algorithms is that they are a lot easier to implement in a way than writing a specific algorithms. Sometimes it is nigh impossible (like writing a deterministic more "traditional" algorithm for a large language model). You, more or less, give training data, and the model gets better and better over time at mimicking the desired results.
Training a ML (machine learning) algorithm to do basic math is sort of worthless, since we already know how to do it in computer science and math. Determining the sentiment of a review and extracting key words or generating an aesthetic image based on a prompt? Much more difficult.
ML is essentially a general brute force sort of method of solving certain kinds of problems that are either too difficult or costly to do in a different way.
Also to note their are a variety of different ML techniques as well, which are better for different kinds of tasks.
LLMs (large language models, like ChatGPT) are increasingly popular and focused on because their ability to do a lot of language understanding based tasks fairly well. And with some newer models being also able to make API calls and execute code, can do in theory a lot. It is the ultimate generalist.
This is sort of why people were and still are excited for Boston Dynamics robots. There are better specific ones, but a generalist one adds a lot of flexibility, since hopefully they can do most tasks that humans can (in the physical sense).
Dedicated NPUs could maybe be useful for keeping costs down in building a system that needs to do ML computations without having to buy a more expensive set up, or in something that needs to be energy efficient (drones or RC like stuff). They are targeting a small niche with it in desktops, IMO, at least currently, because if you were wanting to do AI ML stuff, you would buy either a GPU or just run it off a CPU slowly with more RAM. It seems, at least currently a weird middle ground, which isn't even that middle (more so like running on a CPU than a GPU with large VRAM).
@@cxsey8587:
Yeah, exactly.... Or rather that's what I'm thinking. No different than a CPU vs ARM vs GPU vs APU, it's all just specific hardware and the specific code that drives it, same as it ever was. To me, "AI" just sounds like a new term for some old common everyday tasks. Just like "The Cloud" is a server and storage the same as it ever was, but Microsoft gave those two components a new name and now it's something special.!?
Cheers 🍻
That first demo is pretty much what we were doing with the Xbox Kinect over 10 years ago :) Not sure why it needs an AI chip.
Intel has a stand-alone NPU on a PCIe M.2 card already, but it's not something you can easily just buy. I think the best application for these NPUs will be to upgrade older PCs lacking an NPU, or eliminating expensive GPUs from machines focused solely on AI.
You can't do that because there is not enough bandwidth from the PCIe bus to the RAM. GPUs has plenty power but the moment you offload part of the model from VRAM to RAM, there is a very big slowdown. I'm running LLM models and the moment they go outside of VRAM they become progressively slower, even if only for example 20% is "outside" of VRAM. For example the same model takes 10 seconds if fully inside VRAM and 20 seconds if partially outside VRAM... if it's 50% outside it's 60 seconds... and so on... working only on the CPU it's 200 seconds...
That depends on the price.
@@Slav4o911 What's your setup?
I'm thinking more PhysX than 3dfx.
not for long the ai accelerate card will expand in performance and also size so that mean in the future we might have an ai card in the size of a graphic card
"We're looking at the Lenovo ThinkPad, which is a desktop"... No, nope, you're not looking at a ThinkPad which is a desktop... You're looking at a ThinkCentre... which is a desktop... A ThinkPad is always a mobile device...
So what does a consumer get out of an AI accelerator card exactly? Like, whats the WIFM? (Whats In it For Me)
You mean you don't know?
The first gen of these, probably nothing. But that was also true of early SSDs with miniscule capacities and high prices. In the future, though, they might just be another add-in card to handle AI compute loads more efficiently.
Right now most useful AI is prosumer level, but as hardware proliferates then we should see more software proliferate to use it. Chicken-and-egg. But it's early days, otherwise.
If you run an open source local hosted home security system with something like frigate, adding an AI card let's you add shape recognition
You get to spend money on something you don't need to support the hype.
An ideal AI accelerator for me would generate more tokens per second in large language models (Llamma 2, Mistral, etc.) and faster image generation with Automatic111/Stable Diffusion. If it had enough RAM/Power it could also be used to train or fine tune AI models.
Feels like a solution looking for a problem at the moment. I could see these things being used to enable some really crazy procedural experiences in games, but it feels like that could be done with the tensor cores on a GPU as-is.
If Nvidia keeps being this stingy with VRAM their tensor cores won't help.
This kind of product already exist in China market for years... First major product is release in around 2019 I think, and provided 8TOPS of AI computing performance.
I wonder if this technology could be used to bring ray tracing to older hardware or hardware that's incapable of ray tracing.
The big problem with AI accelerators is that you need a ton of VRAM for useful AI (I have 24GB GPU and it's barely enough); which is likely the limiting factor in terms of price. Though this might have a market for Raspberry Pi kind of AI stuff.
Difference between this and Google's Coral TPU?
TPUs are optimized for specific machine learning algorithms and use cases, while MPU AI accelerators can handle a wider range of AI workloads, but may not be as efficient at specific tasks as TPUs. But I guess the key difference is that TPUs are proprietary hardware developed and used primarily by Google, while MPU AI accelerators are available from a variety of vendors and can be used on any computer with an MPU processor. Another problem is that TPU are mostly supported for Linux systems.
It is also true that some MPU AI accelerators do have dedicated memory specifically designed to handle AI workloads. This "on-chip memory" or "local memory" is designed to store the data and intermediate results of the AI algorithms being processed by the accelerator. This allows the accelerator to access the data it needs quickly, without having to rely on the main system memory.
I would call it the "next step in evolution" after SSDs and Ray Tracing , not the "next 3Dfx". By the way, I gamed on my 3Dfx Voodoo 2 just this week 🙂
So my Coral TPU ain't cutting it anymore? :)
Sorry, so you need a dedicated motherboard to plug this thing into? Will they work in tandem? Could we externalize these and jack them through a dedicated port?
Can't wait to the must have Gamer AI accelerator cards with AI RGB.
Are there more details about the performance with Stable Diffusion you can see on the right-hand screen? An accelerator for Stable Diffusion or LLaMa would be very interesting!
The memory limitations are the main problem with add in ai cards. Large ai models require a lot of memory( 2GB-200GB).
This would be useful with multi m.2 adapter cards, assuming it would be on system ram so ddr level would have to be high, improvements for the cards would multiply result, alternatively currently 300+ flops on the GPUS, but potentially loading/unloading of models would be quicker if split over multiple pci slots presumably. Possibly add options for non-ROCM AMD owners, though would have to research compatibility with torch. Note, it would be the next PowerVR card, because 3DFX were cards requiring cabling, while PowerVR with no ports.
Funny that the Lenovo booth PC has the "Activate Windows" message on the screen. Come on Lenovo....aren't you going to pay for that copy of Windows you're using to demo?
TFLOPs, is float point operations, 16 bit or 32 bit floats. TOPS is usually INT8, because for inference you usually can go with INT8. Training AI is another beer, and needs floats. So basically all players market TOPS, because people will use already trained AI models, and INT8 is much more energy and space efficient.
'The next 3dfx' sounds like 'the next Hindenburg'.
Seems like the second is used to generate images with stable diffusion and you can easily compare its performance with the performance of the current GPUs in Stable Diffusion
MemryX's product is based on their own MX3 chip, which can handle 10 million parameters per chip, 8 chips per M.2 package, that's still 88 M.2 devices (350 PCIe lanes) to address at once for a wimpy 7 billion parameter language model. In an ideal world, these chips would be able to address their own GDDR6x pools per-chip, but I don't know their architecture.
Unless the desktop doesn't have a dedicated GPU, I don't see their long term usefulness. AMD just announced a desktop APU with integrated NPU, Intel will have something for desktop with Arrow Lake. Even without them Nvidia will tell you every desktop with an RTX 30 onwards has on device AI already.
On desktop we don't really care for efficiency as we are plugged in.
Yeah what can be integrated into CPUs will be better than this. The new Snapdragon chips are probably stronger and the new androids will all have it by the end of the year.
@@marshallmcluhan33 What can be integrated into CPUs may be less upgradeable. How rapidly are we expecting this technology to evolve? Old school CPU technology is not likely to evolve anywhere near as fast.
And on the GPU front, I don't see these NPU cards replacing GPUs anytime soon, if ever. GPUs are already heavily optimized for paralleled operations and are generally being marketed as AI cards anyway.
NPUs are more likely to replace classical CPUs for operations that a machine learning algorithm can optimize for parallel computing better than an intern seeking a degree in computer programming. So they are more likely to replace the CPU, than the GPU.
@@marvinmallette6795 Well there are lots of software optimizations that are still being done, quantization can help AI models run on legacy hardware for example. In terms of hardware Windows is doing a push towards ARM chips. Chiplets and RISC-V are things that could unlock better hybrid CPUs. Also we might see a new AVX instruction set made specifically for popular AI applications. There's a large change coming to laptop and desktop CPUs.
@@marshallmcluhan33 I see these M.2 add-on cards as a form of "hybrid CPU".
@@marvinmallette6795 Yeah these may be a stop gap.
why is there stable diffusion running in bacground ?
How about a card with additional GDDR5/6 RAM in it and use it as additional VRAM frame buffer before we use system memory, would that be possible?
it makes sense to iterate the development of AI hardware outside of the CPU/SOC upgrade cycle and we'll produce less landfill this way.
How do they solve for memory bandwidth?
I’d gladly buy an ai accelerator card if it has access to high memory bandwidth. The more the better.
If they can make a product with access to 128gb memory and 60 tops then I’m down to buy it. Need to have good software support too.
For Raspberry Pi 5 over PciE this would fit quite well. For example tasks like speach recognition on edge would be feaseble
How generic are these accelerators?
can a work load meant for the npu on say mtl run on one of these?
I don't really see the point in these. People are saying "great, now my graphics card is free for my games while I run AI", but in actuality you're still competing for the same VRAM, in $$$ if nothing else. Until there is actually a dominance of games that actually use AI while running, I can't imagine most people would be doing heavy AI generation while tying their graphics cards up to running a game on the same platform. It'd make more sense to have two separate computers if that was the case. Seems like they have to find a niche to fill, maybe some commercial application that wouldn't call for a graphics card normally, but has a need for AI.
This will be fantastic for local LLM projects; GPU cards are still expensive.
This thing has 10MB memory.... and no onboard VRAM good luck beating any GPU. Nvidia RTX cards have more cash memory than this.
LLM needs a lot of fast memory .. nowadays we need 80GB+ to run not compressed LLM .
@@mirek190 you can use a rasperrypy to run it with 8gb, you need 80gb+ to retrain (but you need more than that and few peoples do that)
on rasberry you can run only extremally compressed 7B LLM to 4 bit q4.... better use q5k_m or q6 @@PracticalAI_
Is this needed if strong gpu is already in the desktop?
When will these NPU add-on cards be available to buy and will they make more powerful versions say 100 TOPs and more?
This is more like NEC PCX2 chip, it was used in Apocalypse 3Dx and Matrox m3D and was PCI video accelerator card without video input or output.
Would love to see Mythic with it's analog AI processor.
Google Coral chips have been available as m.2 cards and usb dongles for a year.
LLMs need a lot of memory. How does that work with accelerator cards like this?
It doesn't, it's a niche product for image recognition and things like that, it's not for LLMs or Stable Diffusion.
There's something for desktop but it's build in and it's on the Apple Silicon isn't it? Is this module one going to able to scale with more you install in and does any CPU able to talk with them directly?
Is it really the same kind of FP16/FP32 5-10 TFlops per card that can be generally accessible with OpenCL as with AMD/Nvidia gpu's? So far I am dubious of the claims. It also doesn't seem to have any memory outside of the on-die 10MB, so you can run only shitty tiny models that can be easily handled by CPU anyway.
The big problem for desktops is not ai computing power, but enough ram with a high enough bandwidth to gpu and/or the cpu or a dedicated ai chip to load the models into. Just for perspective, a 4090 with 24gig of ram has too small a vram for many llms. Any serious ai accelerator would need to have huge amounts of fast ram to outcompete a gpu. And again, todays gpus are severely vram size limited to what we need for decent performing llms sich as mixtral, which in its original size needs about 48gb. Even a 1080ti could run mixtral okish when it comes to speed, if it had enough vram... I highly doubt these accelerators will be able to run any decent llm. The only one so far going the right way is apple with its m2 192gb ram, 800gb/sec bandwidth hardware. Great for inference of big models. Not the fastest, because its an m2, but outperforms any consumer hardware because of ram size and bandwidth. A 4090 would be much faster, but cannot fit big models in its ram. All amd and nvidia would have to do is give consumer gpus huge vram increases. But they probably wont, because this would make their server gpus less attractive.
Wait, wasn't Coral the first to make an M.2 accelerator?
Thanks for the video!
New AI accelerator cards are aiming to make a significant impact in the tech industry, reminiscent of the game-changing influence 3dfx had in graphics processing. Designed to enhance AI performance and efficiency, these cards could revolutionize machine learning tasks and become pivotal in driving future innovations.
PhysX, not 3Dfx. People forget that Physx was originally an add on card.
People are misunderstanding what this card is doing. This is like an NPU (or Apple neural engine) which runs matrix multiplications hardware accelerated with efficiency in mind. Anyways, if you want to do anything ML related right now, the best option is to get a Mac since the Apple silicon Macs have unified memory which serves the purpose of both RAM and VRAM. I can run 70b models on my work 64GB M2 Max MBP 14" which is just bonkers. Sure that thing costs $3700 if bought new from Apple directly. If you want the absolute best, you can get an M3 Max 128GB unified memory MBP 14" for $4800 which is steep but with 128GB VRAM you can run 120b+ models like Goliath (Goliath takes up 70GB VRAM, and Goliath is amazing). I can see it as a long term investment and if you are a developer, buying a Mac studio Ultra or a MBP with a ton of memory makes sense if you are thinking 5+ years in the future. Macs go up to 192 GB for the ultra btw. The inference speed is also quite nice. My RTX 3070 only has 8GB VRAM for example, but my personal M2 MBA 24GB has memory equivalent to a 4090 (which I got for $1600 on ebay). Also considering how efficient Apple silicon is, you are going to save a lot of money by running your inferences and training on a Mac in terms of energy costs.
A low power but high memory GPU like Card specialized for ml, like s tpu, but with proper torch support and like 32gb of fast memory would be really interesting.
Very interesting! What a time to be alive!
my year-old hailo-8 m.2 AI accelerators are sad that they missed out on being called first gen ._.
Will these boost VRAM for AI applications or only processing speed? Love to see these have like 4,8,12GB of VRAM that can combine with your main GPU. And also boost some speed. Be great if they could be used for like Stable Diffusion but also Local LLMs.
Where can I buy?
"Jensen ! Jensen ! We are here, please buy us !"
I'd love to just have a compression/decompression add on card that is 50x faster than a "hot" multicore CPU. Something I could hand a 1/2TB VM file to and have it squeezed down in 2 minutes.
To beat GPUs, these vendors need to provide about an order of magnitude more high bandwidth VRAM!
can it accelerate machine learning and deep learning training?
So... if you only want A.I. stuff like Stable Diffusion or image background remover, etc, and you don't play3D games, you won't need to buy an expensive GPU any more? How about the memory? GPU's cannot use system memory and must use its own VRAM. Do these A.I. accelerator card have their own RAM or do they use system RAM?
I was kind of under the impression Ai had space to learn and grow in size. So how is this pci accelerator card artificially intelligent? I'm not allocating hdd/sdd space for it.
Looks interesting, but what everyday application supports them. i.e. Why would I currently want one?
Why are they calling that thing a "Think Pad" when it's clearly a "Think Box"?
Great for apu like 8700G offloading the AI work
Cool. This can run on linux?
It's Ageia PhysX add-in cards all over again...
Dude, explain the use case and how.
If these dedicated NPU's are compelling enough, I imagine nvidia will just add them in to their GPU's rather than us having to buy yet another dedicated additional card.
The A.I. acceleration will be done on video cards just like physX. Tho I do like the idea as I play with local AI a lot.
Will it work in laptop ?
What is the big advantage over GPUs doing AI? More power? More specifically made for that task?
Seeing as M.2 NPUs are on display here, I would expect significantly less power. It might be intended as an upgrade path forward for "legacy" x86 IBM/ATX computers.
GPUs may be less marketable as an upgrade, being expensive, loud/thermally demanding, and having a reputation for being for "gamers" and not general purpose office work.
I'm seeing them more likely having an advantage over x86 Intel CPUs, using legacy x86 code which is centered around ordered instructions across a limited number of cores. NPUs would provide a larger "core count" for heavily paralleled workloads that could be executed out of order with the oversight of an AI engine.
@@marvinmallette6795 Cool, thank you for the explanation!
Google did it earlier (2019) with Coral ai m.2 (TPU) cards.
Does Coral ai show up in the Windows 11 task manager?
I think he was one of the legendary people who j3rked off the old Lara. Epic
I also remember physics accelerator add in cards.
*Google Coral released in 2018* :Am I a joke to you?
How many people will name it Hal?
I'm sorry, peaceonearth, I'm afraid I can't do that
"I am sorry Dave, I can't close the DVD door."
/s
lol .the pose sensor pointing to the camera guy.
Make your AI Companion not only super hot but super smart with our AI Accelerator cards!!
Practical use of A.I. 🤔
@@SC-hk6ui Digital friends.
This is not the first. You should look into some of the pcie ai accelerators that are already in use.
3dfx legend is Back
Only if these were available as seperate hardware for the end users to be used with any PC!
PC Building experts will likely need to review the hardware to ensure there is enough PCI-Express bandwidth to ensure proper Operating System stability. It seems like just "any PC" would be likely to suffer crashes and reboots due to SSD disconnects should the shared PCI-Express bus experience excessive congestion.
It does look from the thumbnail that such is the notion, separate M.2 hardware compatible with any PC, that has the available shared PCI-Express bandwidth... Probably best not to pair it with an NVIDIA Geforce or AMD Radeon Graphics Processor.
Imagine having a last gen pc that can't run new unreal 5 games smoothly and buying one of these pretty much makes it a viable solution, instead of a whole expensive pc overhaul.
Blue Screen of Death when your SSD disconnects because your shared PCI-Express bandwidth was overloaded.
This topic has not received the attention that it really should have, M.2 dedicated AI AICs mean ANY pc can run an NPU at relatively low upgrade cost. These need to be priced accordingly and widely available, like SSDs.
If it's faster than my rtx 4080, i'll buy one for my next pc. my current one already has enough of a shortage of m.2 slots without this