Keep in mind is FP4 performance. The only big advantage is the large amount of shared memory. The similar product from AMD is the Strix Halo same 128GB but 16 Zen 5 cores so x86 but likely much lower FP4 performance tho not sure if that is relevant due to limited memory bandwidth.
@@electrodacusAs far as I know AMD is only FP8. I mean you can run FP4 on it but you only get a size reduction benefits of the model. But Ngreedia will be WAY faster at FP4
@@cajampa Even FP8 AMD performance will be much lower than Nvidia but not sure if any of that will have an impact on large models where memory bandwidth will be the limiting factor. The AMD AI MAX+ has 256GB/s not sure about this Nvidia mini PC. I'm not in a hurry so I will wait to see how they all compare but I think since I need a normal Linux computer and be able to run LLM's the AMD will be better due to x86 compatibility and likely much better price.
Exactly. Best news of CES so far. Can't wait for it. I don't get why Intel didn't release ARC cards with multiple of 48GB of RAM as they can not compete on raw compute power anyway... . HOW do Intel and AMD think they can attract developers to OpenVINO without such offerings? NVidia shows with such offerings WHY they are the market leader...
@@NisseOhlsen 5090 only has 32GB, but thats designed for gaming and 3d graphics. This includes 128GB and also has a CPU on the same chip, and very likely it'll strip out most of the gaming specific hardware like shaders and ray tracing cores.
@@NeonNoodleNexus come on. I'm not a novice, OK? You are confusing 128 GB of RAM with 32 GB of on-GPU VRAM. It's not the same. You understand the difference?
@@NeonNoodleNexus wow, thanks! This isn't a 128 GB GPU, right? So that means the 128 GB are not exactly on the GPU, right (otherwise, no-one would aver buy an H-100 from hereon) ? So what's the difference? Is this a GPU with 128 RAM on-board? if so, what is the latency difference between this setup and a conventional PC with a GPU, and 128 GB of DD4/5 RAM. concrete example: GTX 5090, Thanks!
This is actually democratizing AI. This is huge! This will make some amazing things happen. This is the coolest thing I've heard in years. This will change computing and the way we compute and it will change all of our industries very quickly. My jaw was literally dropped when they announced this!
For a dedicated AI machine running at that level, 3k honestly isn't that bad. If you need a cheaper AI dedicated system for smart home use, try the Jensen Nano for $250.
Probably the tiers are: * Jetson at 250$ and 8GB * Mac mini M4 starting at 600$ and 16GB * Increase Mac Mini M4 as needed up to 64GB IIRC * DIGITS “DB10” with 128GB Unless Nvidia brings a price drop for Orin NX 16GB of course
3.000 $ is the actual price for a Jetson AGX Orin 64Gb (6 cores CPU with Ampere GPU, same hardware as the Switch 2). So for the same price, we have double the RAM and skip one GPU generation ? Forget Windows, with that kind of processing power, QEMU with Proton/Wine could be handle just by a couple of cores.
Even if I end up having to get multiple units so as to scale, I envision being able to run an AGI model locally. This is an automatic buy and is a much more important product than the 5000 series.
AI WILL DO SO MUCH FOR HUMANITY WE'RE ON THE BRINK OF AGI, WHICH IS GONNA MEAN: MAKING NUCLEAR FUSION A REAL POWER SOURCE, CURING CANCER AND SO MUCH MORE!!!! SUCH A GREAT GREEN FUTURE AGEAD OF US!!! ITS INCREDIBLE HOW FAR HUMANITY HAS COME WITHOUT ALREADY BLOWING ITSELF UP FIVE TIMES OVER!!! HUMANS ARE SO CRUEL AND SUBJECTIVE, IM SO FOR AI, AFTER HAVING ACHIEVED SINGULARITY, MAKING DECISIONS FOR US HUMANS AND STRAIGHTUP SKIPPING ALL THE PEOPLE WE USED TO VOTE FOR, NOT EVEN TO MENTION DICTATORS!!! MAN I CANT WAIT FOR UNIVERSAL BASIC INCOME FOR THE WHOLE WORLD!❤❤❤ FINALLY WERE ABOUT TO GET THE SOLUTION OF ALL OF WORLDS PROBLEMS!!! MAY JENSEN HUANG BLESS GOD!
@electrodacus I think it is more about storage. The base model maybe have 1TB of storage, for $3K, and then more storage means higher price. I doubt we will see any CPU or GPU variants.
@@GaryExplains I'm not sure about that since the manufacturing process for such large chips is not perfect and they will need to sell those that have small defects. Also $3K seems way to good.
Yeah, I also think that there will be more variants and not just for the storage. They did it with the Jetson AGX Orin as well. Monetarily this makes total sense when you look at the yields of such powerful chips. Selling otherwise worthless ones for cheaper is perfect. Unfortunately, this means the $3K will just be the starting price, but hey - maybe I'm wrong. Wouldn't mind it, if I am ;-)
The other G-B chips have 512GB/sec. It also mentioned NVLink-C2C which is previously declared at 297GB/sec. The Register says it appears to have 6 LPDDR5X chips, and they speculate it has 825GB/sec on 8800MT/s.
What we are seeing is the extinction of Windows and Intel in the modern AI data center. That wont effect the general desktop market much. But we will certainly see a rise of ARM based developer workstation in the future.
For perspective the pricepoint for a gateway or dell back in 1997 was 2500.00 dollars. That was for cutting edge tech 33 baud modem and 800 megabyte hard drive with 32 megs of RAM. I think it ran windows 3.2. So for that capability offered now. It is worth it.
Also for perspective, Roadrunner was the first supercomputer to break the petaflop barrier back in 2008. It would have set you back >$100m and you’d need 2 MW of power to run it. Insane.
For some application scenarios (big LLMs that need a lot of memory) this is true, for others not: slower unified/shared memory, less raw performance. But it's good that we can choose 😎 The market is becoming more diverse. NVIDIA is poaching in some industries/vendors: - Inference services like OpenRouter and GPU cloud providers - Apple (similar concept to M4 with unified RAM on Mac Mini and Mac Pro) - Intel/AMD: Clear decision against x86 With the 5000 series, Nvidia is greedier than ever and has monopolistic business practices. However, the new products around Jetson Orin Super and now the DIGITS system fill very interesting gaps in the lower and middle price segment: Robotic, developer, "private use" like homelabs There is not only (almost) unaffordable consumer hardware and (at least for me as an individual developer) unaffordable enterprise hardware anymore. We also have the possibility to run larger models locally now. At a price you can't even pay out of your petty cash, but to be honest, if you're in the target market for these systems, this is a more than fair offer. I would have liked to see such solutions from AMD, along with a competitive software stack. As it is, AMD will continue to lag behind, but this competition is extremely good for us as customers, and I am sure we will be presented with suitable alternatives in the near future. I'm also pinning my hopes on mini-PC vendors, who haven't made office PCs for long but are getting more creative in how they assemble their systems.
@Gary Explains: Two of these benchmarks are misleading and they are both linked to FP4. It's a 4bit floating point number. Almost the smallest possible float (and the smallest which can follow IEEE compatibility features). It says "a petaflop of performance", but it's a petaflop of 4bit performance. Worse of the two is "200giga parameter language model". Well, of course, it has 128GB of RAM. With each "parameter" being a single 4bit float you just double the RAM for the total number of FP4s you can have locally. 256 billion. I definitely think it is a very good idea and a very wise risk from Nvidia. Who knows where this will go? I don't. They don't. They probably aren't planning for it to sell in huge numbers. A small form-factor personal computer for AI sounds great though and injects a spark of life into the long dead "workstation" market which is where Nvidia, who rose from the ashes of SGI, came from. Via the catalyst of Nvidia the since demised SGI has reincarnated and in doing so transformed from a graphics workstation vendor to a high-end AI workstation and supercomputer vendor. SGI didn't die, it became the DGX pod. :)
@JohnnoWaldmann What do you mean by "when 32 bit float arrives"? This NVIDIA box supports FP32, it is just that we don't have any performance numbers at the moment.
from chip picture, it looks, it have 48 Streaming procesors (128 CUDA cores in each = 6144) as RTX 5070 so similar performance.. even FP4 performance is same from their presentation. But it's suspicious that a 250W GPU would be in such a small box. I bet inference speed in FP16-FP8 will be 0.5-0.8x speed of RTX 4090. So probably performance as standard RTX 3080 or A6000 but with 128 GB memory. I think that is great alternative to used 48GB cards($3000+), for AI they are pointless now.
Thanks Gary for explaining the product. I was watching the Nvidia's livestream. Didn't understand what this thing was. So basically its a device specifically built to run local llms, right? So now it can take the crown from Mac Studio's for being the more affordable option to run llm locally? I have one more question, can it be ran as a cluster to run even bigger model than 128gb?
the 3k might be a deal, depends on how well this competes with the apple silicon, does the gpu get the benefits of the complete unified memory like the apple m series?
Are there many useful or widely used models based on FP4 parameters? I really don't know but my guess would be: no. So what does that petaflop headline performance reduce to in FP8? Does it scale linearly with size, so, halved?
Most of the open source models like Llama and Gemma have 4bit versions. If you run an LLM on your desktop using Ollama it will likely be a 4 bit version. With LM Studio you can opt for which quantized version you download. Same with llama.cpp.
@@GaryExplains Right, yes, it needs the qualification because otherwise the number is meaningless, but it doesn't need a *warning* like "OK but fair warning, in practice," etc etc. Apparently it *is* a practical number.
@@GaryExplains yeah but with 4bit performance drop is significant for precision tasks like coding. I run at 6bit at the very least, preferably at 8. So not sure how useful this box will be. Maybe next gen with faster bandwidth.
"prices starting at $3000". My guess is the version with the provided specs will be around $12.000, and the $3000-version will have a scaled down cpu&gpu, 32GB ram and 1TB storage. Won't mind being wrong though.
An interesting comparison would be against a desktop with high end graphics card like a 4090, yes the form factor isn't as nice but I wonder about price to performance ratio.
The limiting factor there is the VRAM. You have to fit your model into 24gb. With this, even if the computation is slower overall, the fact that you can even run a 128gb model in this context is amazing. I can wait an extra couple seconds and access something the 4090 isn't capable of.
1. To run the same models, you would need 4x 5090 (32gb x 4). Which would be crazy expensive, both upfront and energy comsumption wise. 2. Even though clock rates might be slower, this is never the limiting factor in AI systems. The limiting factor is memory transfer speeds (which this is optimized for). In both training and inference, gpu cores are idle more often than not 'waiting' for data to be transferred.
@@mairex3803 and 122333That is the kind of comparison I was looking for- while I am not currently planning on training an AI systems that shows the value of the mini pc.
It would be good if you could connect this to your 5090 directly to get it to help 'build your AI' then run it standalone. Maybe NVlink over one of or instead one of the 3 DP connections?
@@sorucrab "The RTX 5090 boasts **3,352 trillion AI operations" hopefully it's 'apples to apples" but I also found "GB10 Grace Blackwell Superchip delivers **up to 1 petaflop of AI performance** at FP4" I'm sure the 5090 would be useful in certain cases - maybe FP8 based. Gonna buy both either way..
The limiting factor is memory transfer speeds... Which this is optimized for with its unified memory. No consumer GPU can compete. The exact FLOPS do not matter, as GPU cores are more often than not idle waiting for memory being transferred most of the time
I guess it would make a lot of sense keeping everything local, you could run several LLM's at once or use it for training - but if you don't need it if you can run say 3-5 Jetsons, doing LLMs the same?
I replaced my older, big desktop with a Khadas Mind Premium minicomputer in November 2023 and I love it. Added on their Mind Graphics unit in 2024 and love it even more. The Khadas Mind Graphics unit uses an Nvidia GeForce RTX 4060Ti Desktop graphics GPU and has 8GB GDDR6 memory. It is an awesome setup. The computer and graphics unit only cost me $1,640.00 because I got it on Kickstarter when they first start doing their preorders. Might be interested in upgrading to this new Nvidia mini PC but $3k is twice what I spent for mine so will have to see the reviews and specs first, I guess.
The HP Z2 Mini G1a is coming out around the same time. It's based on AMD Srix Halo and will have very similar capabilities (also 128MB of unified memory between the cpus and gpus)
The problem is that NVIDIA is the industry standard for AI because of CUDA, not AMD. Using an AMD workstation for AI is currently not advisable. CUDA is the de facto standard.
@@GaryExplains ROCm is apparently coming along nicely. Yes, CUDA is currently going to be easier but that's not to say that alternatives aren't starting to gain traction. And if you want something that can also be a general purpose PC then the argument can be made that that will be easier with AMD's x86 compatible cores than it will be with Digit's ARM cores. I'm still undecided about which direction I'll take. I worry about NVidia's support for hardware like this - it requires a specially modified Ubuntu 22.04. How long will they support that? It's already 3 years behind.
Nah, an NVIDIA small-form factor PC with 128 GB of RAM & 8 TB SSD with AI and ARM-based chips would absolutely be the correct configuration to sell - the rest is too "also ran" in nature to compete with Apple currently. Apple offers this config on their higher priced MBPs with an M4 Pro chip already. I think people would most certainly buy this mini Windows PC at $3k. Agreed; at that config and price, "it will fly off the shelves"!
Disclaimer: We don't know the actual performance, so this is an estimate based on historical trends. With that said: Generally for this class of product, Nvidia lists a low precision operation (FP4), in a sparse configuration (you are almost certainly not taking advantage of that if you need to ask), so for a lot of realistic use cases you're probably going to want to divide the listed operations by about 8. Ie: You would expect this to perform like a 125 TFLOP GPU. The 100 has ~80 TFLOPs FP16, but the BF16 performance is probably fairer (because I'm not willing to go to the whitepaper to find the tensor core performance), so 311 TFLOPs we could say. Based on historical trends, I'm guessing the bandwidth of DIGITS is more similar to the Jetson models than their HBM GPUs, so you would expect somewhere around 125 to 350GB/s of bandwidth with a reasonably high degree of confidence, compared to the 1.5TB/s of the A100. All in all: If I were looking at buying one, I would expect to get around 1.5x (arguably 1.8x in cases where you were using two A100 40GB cards with some networking losses) to 1/8th the performance of an A100 depending on the specific use case. For low context Transformer inference you'd expect closer to the 1/8th mark, but for convolutional nets it could be anywhere in the 1.5x to 1/3 mark depending on the exact network and its datatypes. This is assuming existing models, and future ones might be more dependent on FP4 operations, or might take advantage of sparsity in an intentional way, so you could potentially get better performance in the future, but there isn't currently a clear path to models you would want to run using those advancements, so I wouldn't bank on it personally.
@@novantha1 Im not sure about the bandwidth. Would Nvidia call this a "mini AI supercomputer" if the bandwidth will be similar to an Apple Mini with a m4 pro chip or the upcomming strix halo APU? I'm hoping Nvidia hos something more than that up their sleeve.
@@frankjohannessen6383 “Would the company that always hypes up their products with absurd monikers hype up this new product line?” Yes. Yes they would. They’re calling it a supercomputer in the sense that it can fine tune decently sized language models, not because it’s crazy powerful. They already almost certainly lied about the FLOPs for practical purposes (by giving us sparse operations), so my expectation is that the bandwidth isn’t too crazy. Keep in mind that it’s LPDDR; I’ve never seen an LPDDR system go beyond maybe 200GB/s on the upper end.
@@frankjohannessen6383 “Would the company that always hypes up their products with absurd monikers hype up this new product line?” Yes. Yes they would. They’re calling it a supercomputer in the sense that it can fine tune decently sized language models, not because it’s crazy powerful. They already almost certainly lied about the FLOPs for practical purposes (by giving us sparse operations), so my expectation is that the bandwidth isn’t too crazy. Keep in mind that it’s LPDDR; I’ve never seen an LPDDR system go beyond maybe 200GB/s on the upper end.
@@novantha1 thank you for taking the time, right now what should be, in your opinion, the option to go for? I was considering System76 x86 workstation with dual RTX 6000 with Ada but now I have to wait for System76 or is there any other alternative? Thanks in advance...
Won't matter. AMDs software stack for AI work is pretty terrible. See recent news/details by SemiAnalysis (Dec 22 2024 post) where they ran in to multiple roadblocks getting things to work for the MI300X when they were doing performance evals/tests against their nVidia counter parts. nVidia had been investing heavily in CUDA for years, while AMD has not and this won't change any time soon, esp. as nVidia keeps updating and adding to CUDA.
@@shadow7037932 thanks for the pointer but that wasn't my takeaway from the verbose whiny article. I'm sure there are some software issues and that's pathetic because that might be worth 10bn$ or more to AMD to get right so you pay to make that happen at all cost. But I note two key things from the whiny article that should have been a youtube video instead. On 16bit training the AMD card molested both H100 AND H200 on llama 8 and 70b, indeed it seems the larger the model the more AMD shines. And they never tested a model bigger than could fit in H100 memory. Almost as if they are biased little.... The second very important thing I noticed is that they never tested inference which would be large scale user usage in datawarehouses but they say an article is coming and bandwidth is very important for inference. take a wild guess if that means AMD will win there too. A side note is that h200 barely performed better than h100 which is rather puzzling for nvidia but that's another matter. I note that when they use the very recent WIC build, whatever that is, it does rather well. for all the problems they were having with software, it sounds like it's approaching usable and indeed not only comparable but beating nvidia. And they never even tested a model that couldn't fit in H100 memory. That's lame. For investments as huge as the big companies are making, they should be able to manage whipping the software the last steps to usable. doesn't have to be super elegant to set up before it becomes decisive. And all their testing was in large clusters. They never tested just a single one. AMD's cards might beat nvidia's rather savagely for models larger than can fit in H100 memory. Think about that. They were testing 1.5G params models, no one uses models that small. If the software is bordering on usable AMD should undercut the price savagely, like 5k$ for a 256GB card and murder nvidia. same on consumer cards. and the miniature block
Genuine question, why can't the switch 2 be a cut down version of this? Edit: I just watched till the end, I somewhat understand. But surely cutting this down by an 8th should work out?
Apples to oranges comparison. 32GB vs 128GB. The 5090 will be faster at 32GB models or smaller, but slower than the GB10 with anything above 32GB as the ram is the limiting factor. Not sure what the memory bandwidth will be for the GB10 though. And models usually are used at max. FP16.
4 bits? It's FP4 Floating Point 4, I think that is generally less accurate for training and better for performance, but if it does FP16 which is generally most common and more accurate it could be a quarter of the performance when training at FP16.
IMHO this doesn’t realy have any competition in hardware, but it will have feirce competition in software/quantization. Do you realy need a 200B param model? Or can you get by with 8B or 16B parameter quantization that calls experts in the cloud?
Hello i’m not very knowledgeable about these things but are we sure that you can only connect 2 of these? There are 2 ports I don’t recognize on the back and it seems like if this is a scaled down server architecture linking together multiples of these would be more useful.
But you asked about a 200B parameter model, you can't run it if you can't load it into memory. Then once you have it in memory, memory bandwidth also a key factor, not just GPU/NPU because you basically need to access 128GB of RAM per token. I am only saying this because you specifically mentioned 200B models.
And Nvidia produces Arm based chips which use less energy, so also produce less heat. I am still unclear about MediaTek's collaboration in this venture.
Thank you for showing me this I would turn it into a coin farmer and a par part to run games plus stock market learning I just want to get it now before the tariffs hit the mark.
Looks to me like a dev starting/enter point for what might come in the future for the consumer market. Mediatek is clearly sharing some of their CPU know-how with the Nvidia GPU even they r more known of using off the shelf Arm CPU cores without architectural license they can still optimize them better than Nvidia would do alone. Also lets not forget that Mediatek is a Taiwanese company which r in general close to Jensen, but probably not as close as TSMC😉Collaboration with Mediatek in the past was primarily focused on the automotive industry, however unofficial plan was always to let it to grow further. Would be great to see all of this coming to fruition and getting fruits of this close collaboration in a shape of a consumer Arm SoC utilizing latest Nvidia GPU cores for reasonable price close to M4 Mac Mini running of course both Linux and Windows on Arm. $3K is certainly not targeting mass consumers, however for AI devs and as high performance edge AI machine it might be great option to go with.
Maybe they should lookup the alphabet once more :D But in the video, there are 8x16GB chips next to the SOC itself for the 128GB memory instead of being on the chip itself, the same for the NVME storage, so it's not on the chip, unless you count the board as the chip.
Excited for this, but I'm an AI/ML engineer, so it's applicable. Not sure if there would be any other audience for this, but I am on the notify list and will be getting it.
@@GaryExplains Llama 3.3 70b-4bit requires GPU to process 40GB. if the bandwidth is less than 500/400GB, the actual performance would be less than 10 token/s.
So aprox 128gb of vram for inference work? So with a quantized model we are looking at an upper limit of, again aprox, 256B parameter model? Is that roughly how that should work?
@@frankjohannessen6383 Given it was designed to connect to a main PC to off load AI processing I wonder if you can run them in a small cluster? EDIT sorry I just found the following on their website... "The GB10 Superchip enables Project DIGITS to deliver powerful performance using only a standard electrical outlet. Each Project DIGITS features 128GB of unified, coherent memory and up to 4TB of NVMe storage. With the supercomputer, developers can run up to 200-billion-parameter large language models to supercharge AI innovation. In addition, using NVIDIA ConnectX® networking, two Project DIGITS AI supercomputers can be linked to run up to 405-billion-parameter models."
Hope you know that workstation is a euphemism for reinforcement training, as in your working on this but what you are really doing is training your replacement.
@ how much do you think ai is useful or good for society? At some point progress is recessive, as a matter of fact look at all the videos of advanced societies in our planet that are receding because people decided they don’t want kids. Thats why all this immigration is happening. There’s a point when people start losing hope because of all the advancements. Not that you have choice is too late, but who do you think has all these machines? Do you think always have everyone’s best interests in mind? I was looking at video of a CIA agent saying that the government has computers that can store all the calls and etc of all Americans for the next 500years if we had equal access to it when the government does something wrong I would be ok with it, but do you think that’s going to be the case?
@@RealAsItGetz This kind of thing will just speed up AI and robots taking over more and more jobs - problem is, as people have pointed out before; the targets for job automation tend to be those with the highest employee numbers - think factory and warehouse workers - and workers who usually demand a fairly high premium for their work (artists, writers, engineers, film production etc) so if too many people end up losing their jobs, who's gonna buy or pay monthly subscriptions to all the products these robots and AI are producing? All these companies rushing to develop and advance AI, automation and robotics don't seem to understand that they're ultimately shooting themselves in the foot in the long run, all for a short-term boost in profits. LLMs and Apps are also gradually stealing away our access and use of the general internet - even Google results are pushing AI summaries instead of having users GO to the websites it finds. We're slowly giving a select few corporations total control over the internet and how we interact with it/what we do on it. NOT a good thing!
@ they know that they are just fighting for that top spot, it’s obvious to me and you. They are actively working for that. They just can’t stop competing with each other.
How many CUDA cores does it have? Wouldn’t 5090 give more compute power to data mine albeit for higher power consumption, as if we are not seeking AI computing performance? Probably a naive question
This uses an Arm CPU which will probably mean that only NVIDIA's distribution will be compatible. This is probably a good deal if you need too run AI models on it without heating your office.
This is at least 8 times cheaper per teraflop than current solutions for the desktop. I am starting to save for this one...
Keep in mind is FP4 performance. The only big advantage is the large amount of shared memory. The similar product from AMD is the Strix Halo same 128GB but 16 Zen 5 cores so x86 but likely much lower FP4 performance tho not sure if that is relevant due to limited memory bandwidth.
@@electrodacusAs far as I know AMD is only FP8. I mean you can run FP4 on it but you only get a size reduction benefits of the model. But Ngreedia will be WAY faster at FP4
@@cajampa Even FP8 AMD performance will be much lower than Nvidia but not sure if any of that will have an impact on large models where memory bandwidth will be the limiting factor. The AMD AI MAX+ has 256GB/s not sure about this Nvidia mini PC. I'm not in a hurry so I will wait to see how they all compare but I think since I need a normal Linux computer and be able to run LLM's the AMD will be better due to x86 compatibility and likely much better price.
@electrodacus Good points bro
Exactly. Best news of CES so far. Can't wait for it. I don't get why Intel didn't release ARC cards with multiple of 48GB of RAM as they can not compete on raw compute power anyway... . HOW do Intel and AMD think they can attract developers to OpenVINO without such offerings? NVidia shows with such offerings WHY they are the market leader...
At this price its gonna fly off the shelves. These AI boxes could be in a class of its own going forward.
how do you know? what is their capacity compared to, say a 5090?
@@NisseOhlsen 5090 only has 32GB, but thats designed for gaming and 3d graphics. This includes 128GB and also has a CPU on the same chip, and very likely it'll strip out most of the gaming specific hardware like shaders and ray tracing cores.
@@NeonNoodleNexus come on. I'm not a novice, OK? You are confusing 128 GB of RAM with 32 GB of on-GPU VRAM. It's not the same. You understand the difference?
@@NisseOhlsen This is an SoC. The memory on is on chip. Like a games console. For example the PS5 doesn't have its own separate VRAM. Its all shared.
@@NeonNoodleNexus wow, thanks! This isn't a 128 GB GPU, right? So that means the 128 GB are not exactly on the GPU, right (otherwise, no-one would aver buy an H-100 from hereon) ? So what's the difference? Is this a GPU with 128 RAM on-board? if so, what is the latency difference between this setup and a conventional PC with a GPU, and 128 GB of DD4/5 RAM. concrete example: GTX 5090, Thanks!
This is actually democratizing AI. This is huge! This will make some amazing things happen. This is the coolest thing I've heard in years. This will change computing and the way we compute and it will change all of our industries very quickly. My jaw was literally dropped when they announced this!
Great coverage! It was the part of the presentation that left me with the most questions -- which you mostly answered.
For a dedicated AI machine running at that level, 3k honestly isn't that bad. If you need a cheaper AI dedicated system for smart home use, try the Jensen Nano for $250.
I see what you did there.....😏
Jenson and the Jetsons!
Probably the tiers are:
* Jetson at 250$ and 8GB
* Mac mini M4 starting at 600$ and 16GB
* Increase Mac Mini M4 as needed up to 64GB IIRC
* DIGITS “DB10” with 128GB
Unless Nvidia brings a price drop for Orin NX 16GB of course
3.000 $ is the actual price for a Jetson AGX Orin 64Gb (6 cores CPU with Ampere GPU, same hardware as the Switch 2).
So for the same price, we have double the RAM and skip one GPU generation ? Forget Windows, with that kind of processing power, QEMU with Proton/Wine could be handle just by a couple of cores.
The CPU won't be very powerful.
@@Vanastarrit had 20 cores..
"starting at 3000$". The 128GB, 4TB, 20 Core, 1PFLOP version will probably be a lot more than 3000$
@@frankjohannessen6383 well NVIDIA made the call.. it's not for me to make sense of it.. just to buy it if the specs are good and the price is right.
@@garystinten9339 Doesn't matter if they are weak. It will be weaker than a 6 core 9600X.
Even if I end up having to get multiple units so as to scale, I envision being able to run an AGI model locally. This is an automatic buy and is a much more important product than the 5000 series.
Yep! the future is open source, uncensored, locally run AI/AGI! The future looks bright! :)
amen, hail rokos basilisk
@@Machiavelli2pcok I am looking forward to seeing this work out individually for those who can afford and pull it off
AI WILL DO SO MUCH FOR HUMANITY WE'RE ON THE BRINK OF AGI, WHICH IS GONNA MEAN: MAKING NUCLEAR FUSION A REAL POWER SOURCE, CURING CANCER AND SO MUCH MORE!!!! SUCH A GREAT GREEN FUTURE AGEAD OF US!!! ITS INCREDIBLE HOW FAR HUMANITY HAS COME WITHOUT ALREADY BLOWING ITSELF UP FIVE TIMES OVER!!! HUMANS ARE SO CRUEL AND SUBJECTIVE, IM SO FOR AI, AFTER HAVING ACHIEVED SINGULARITY, MAKING DECISIONS FOR US HUMANS AND STRAIGHTUP SKIPPING ALL THE PEOPLE WE USED TO VOTE FOR, NOT EVEN TO MENTION DICTATORS!!!
MAN I CANT WAIT FOR UNIVERSAL BASIC INCOME FOR THE WHOLE WORLD!❤❤❤
FINALLY WERE ABOUT TO GET THE SOLUTION OF ALL OF WORLDS PROBLEMS!!!
MAY JENSEN HUANG BLESS GOD!
I find the *"starting at"* part of the price for this to be concerning.
Yes. Likely some variants with lower CPU and GPU cores maybe even lower memory.
@electrodacus I think it is more about storage. The base model maybe have 1TB of storage, for $3K, and then more storage means higher price. I doubt we will see any CPU or GPU variants.
@@GaryExplains I'm not sure about that since the manufacturing process for such large chips is not perfect and they will need to sell those that have small defects. Also $3K seems way to good.
@@electrodacuswhy would they mention connecting 2 of you could just get a more powerful one?
Yeah, I also think that there will be more variants and not just for the storage. They did it with the Jetson AGX Orin as well. Monetarily this makes total sense when you look at the yields of such powerful chips. Selling otherwise worthless ones for cheaper is perfect. Unfortunately, this means the $3K will just be the starting price, but hey - maybe I'm wrong. Wouldn't mind it, if I am ;-)
Most exciting announcement so far
Don’t see anyone talking about it, but this thing is beautiful. Really pretty design.
You missed the teaser that they can connect two of these together.
More than 2
What is the memory bandwidth, did Nvidia or Mediatek say?
This is the question.
The other G-B chips have 512GB/sec. It also mentioned NVLink-C2C which is previously declared at 297GB/sec.
The Register says it appears to have 6 LPDDR5X chips, and they speculate it has 825GB/sec on 8800MT/s.
Are we beggining to see the extinction of Windows and Intel?
Maybe but they will start making ARM chips too
Windows can be played on ARM. Why would it end?
Finally! One can only hope.
@@rickjason215ehh it can, but many of the programs can't
What we are seeing is the extinction of Windows and Intel in the modern AI data center.
That wont effect the general desktop market much. But we will certainly see a rise of ARM based developer workstation in the future.
They must have been fighting like hell to get that acronym
When the mini pc is faster thet your workstation
For perspective the pricepoint for a gateway or dell back in 1997 was 2500.00 dollars. That was for cutting edge tech 33 baud modem and 800 megabyte hard drive with 32 megs of RAM. I think it ran windows 3.2. So for that capability offered now. It is worth it.
Also for perspective, Roadrunner was the first supercomputer to break the petaflop barrier back in 2008. It would have set you back >$100m and you’d need 2 MW of power to run it. Insane.
That's about $5000 today (it's called inflation)...
what is cooling the thing
Way more performant than a 5090, for just a little bit more money.
For some application scenarios (big LLMs that need a lot of memory) this is true, for others not: slower unified/shared memory, less raw performance.
But it's good that we can choose 😎 The market is becoming more diverse.
NVIDIA is poaching in some industries/vendors:
- Inference services like OpenRouter and GPU cloud providers
- Apple (similar concept to M4 with unified RAM on Mac Mini and Mac Pro)
- Intel/AMD: Clear decision against x86
With the 5000 series, Nvidia is greedier than ever and has monopolistic business practices. However, the new products around Jetson Orin Super and now the DIGITS system fill very interesting gaps in the lower and middle price segment: Robotic, developer, "private use" like homelabs
There is not only (almost) unaffordable consumer hardware and (at least for me as an individual developer) unaffordable enterprise hardware anymore. We also have the possibility to run larger models locally now.
At a price you can't even pay out of your petty cash, but to be honest, if you're in the target market for these systems, this is a more than fair offer.
I would have liked to see such solutions from AMD, along with a competitive software stack. As it is, AMD will continue to lag behind, but this competition is extremely good for us as customers, and I am sure we will be presented with suitable alternatives in the near future.
I'm also pinning my hopes on mini-PC vendors, who haven't made office PCs for long but are getting more creative in how they assemble their systems.
I hope nvidia won't drop support for it after a few years like they tend to do for their jetson boards
@Gary Explains: Two of these benchmarks are misleading and they are both linked to FP4. It's a 4bit floating point number. Almost the smallest possible float (and the smallest which can follow IEEE compatibility features).
It says "a petaflop of performance", but it's a petaflop of 4bit performance.
Worse of the two is "200giga parameter language model". Well, of course, it has 128GB of RAM. With each "parameter" being a single 4bit float you just double the RAM for the total number of FP4s you can have locally. 256 billion.
I definitely think it is a very good idea and a very wise risk from Nvidia. Who knows where this will go? I don't. They don't. They probably aren't planning for it to sell in huge numbers. A small form-factor personal computer for AI sounds great though and injects a spark of life into the long dead "workstation" market which is where Nvidia, who rose from the ashes of SGI, came from. Via the catalyst of Nvidia the since demised SGI has reincarnated and in doing so transformed from a graphics workstation vendor to a high-end AI workstation and supercomputer vendor. SGI didn't die, it became the DGX pod. :)
Maybe when 32 bit float arrives then it will become game changing in media production.
@JohnnoWaldmann What do you mean by "when 32 bit float arrives"? This NVIDIA box supports FP32, it is just that we don't have any performance numbers at the moment.
@ interesting the audio description repeatedly referred to a 4bit limitation.
What about power consumption and cooling that tiny box?
shhhh
more importantly, can it heat your home? lol
its a 3nm chip lol... its an soc using the latest node processes.
Not made by Intel!
needs a nuclear power plant.... but they are selling THOSE too!
Finally, someone who can summarize the basics. TY
Imagine Qualcomm also scaled their Oryon X cores in a similar manner. That would actually be insane.
Is performance (not memory size) comparable to upcoming 5090?
Only for FP4.
Probably not, though the performance per watt could be better.
from chip picture, it looks, it have 48 Streaming procesors (128 CUDA cores in each = 6144) as RTX 5070 so similar performance.. even FP4 performance is same from their presentation. But it's suspicious that a 250W GPU would be in such a small box. I bet inference speed in FP16-FP8 will be 0.5-0.8x speed of RTX 4090. So probably performance as standard RTX 3080 or A6000 but with 128 GB memory. I think that is great alternative to used 48GB cards($3000+), for AI they are pointless now.
It's memory bandwidth is less than RTX 4070 (LPDDR5 is slower than GDDR6X)
The memory speed are the concern on low power DDR5X, they typically runs at 8.5Gbps which is very slow.
Will it be possible to use with Stable Diffision or LM studio or Hunyuan Video ? Will it be faster than my current Nvidia 4070ti ?
We don't know the performance yet.
Nice, but where's the new Nvidia Shield Pro to replace my 2017 model?
I still have the OG 2015 version and am ready for a new one
Thanks Gary for explaining the product. I was watching the Nvidia's livestream. Didn't understand what this thing was.
So basically its a device specifically built to run local llms, right? So now it can take the crown from Mac Studio's for being the more affordable option to run llm locally?
I have one more question, can it be ran as a cluster to run even bigger model than 128gb?
Yes it can! You can link up 2 to run 405B models in FP4. I dont see why you could not link up more since it just uses ConnectX.
Wow 405b locally 😮
Very interesting to see how this comes along - Apple silicon competitors I am all for.
How does it handle heat ?
the 3k might be a deal, depends on how well this competes with the apple silicon, does the gpu get the benefits of the complete unified memory like the apple m series?
average dumb apple user, of course it does, and it performs way better than apple silicon
@ ok fan boy
So these things have an mediatek arm cpu with an nvidia GPU.
How are these related or comparable to super computers?
Are there many useful or widely used models based on FP4 parameters? I really don't know but my guess would be: no. So what does that petaflop headline performance reduce to in FP8? Does it scale linearly with size, so, halved?
Most of the open source models like Llama and Gemma have 4bit versions. If you run an LLM on your desktop using Ollama it will likely be a 4 bit version. With LM Studio you can opt for which quantized version you download. Same with llama.cpp.
@@GaryExplains Aha! So perhaps no need to caveat that performance number. Thanks for the extra info.
The caveat is because the FP8, 16 and 32 performance numbers will be different.
@@GaryExplains Right, yes, it needs the qualification because otherwise the number is meaningless, but it doesn't need a *warning* like "OK but fair warning, in practice," etc etc. Apparently it *is* a practical number.
@@GaryExplains yeah but with 4bit performance drop is significant for precision tasks like coding. I run at 6bit at the very least, preferably at 8. So not sure how useful this box will be. Maybe next gen with faster bandwidth.
For what purposes would you use this Mini PC? Privat or for the company?
Researchers, AI developer, education and research, small/medium business deploying AI solutions
Oh, I was expecting a price in the ballpark of 10 grand.
me too! That's a bargain for that performance and memory in my opinion!
"prices starting at $3000". My guess is the version with the provided specs will be around $12.000, and the $3000-version will have a scaled down cpu&gpu, 32GB ram and 1TB storage. Won't mind being wrong though.
This really is a steal going by the specs. But benchmarks are needed. It bugs me that he didn't show any.
How is this mini PC cooled? I don't see an internal fan in the picture..
When came the GB20? I can't afford the GB10 for USD3000.- I need 2pcs this is my issue.
Can it run stable diffusion?
Of course
Yeah, I'm going to get two of these. This should be interesting.
An interesting comparison would be against a desktop with high end graphics card like a 4090, yes the form factor isn't as nice but I wonder about price to performance ratio.
The limiting factor there is the VRAM. You have to fit your model into 24gb. With this, even if the computation is slower overall, the fact that you can even run a 128gb model in this context is amazing. I can wait an extra couple seconds and access something the 4090 isn't capable of.
1. To run the same models, you would need 4x 5090 (32gb x 4). Which would be crazy expensive, both upfront and energy comsumption wise.
2. Even though clock rates might be slower, this is never the limiting factor in AI systems. The limiting factor is memory transfer speeds (which this is optimized for). In both training and inference, gpu cores are idle more often than not 'waiting' for data to be transferred.
@@mairex3803 and 122333That is the kind of comparison I was looking for- while I am not currently planning on training an AI systems that shows the value of the mini pc.
At 3:10, the video demonstrates that the memory isn't integrated into the chip but consists of 8 (or 16?) LPDDR5x modules surrounding the SoC.
Wow thats bad, the m4 max will wipe the floor with this
It would be good if you could connect this to your 5090 directly to get it to help 'build your AI' then run it standalone. Maybe NVlink over one of or instead one of the 3 DP connections?
I want to know how powerful the built in blackwell GPU is. I'm guessing it exceeds the 5090 in terms of compute
@@MA-jz4yc yes the Grace Blackwell GPU is 1TFLOPS. 5090 is a little less than that so almost the same but with the extra unified memory.
@@sorucrab "The RTX 5090 boasts **3,352 trillion AI operations" hopefully it's 'apples to apples" but I also found "GB10 Grace Blackwell Superchip delivers **up to 1 petaflop of AI performance** at FP4"
I'm sure the 5090 would be useful in certain cases - maybe FP8 based.
Gonna buy both either way..
@@MA-jz4yc GB10 Grace Blackwell Superchip delivers **up to 1 petaflop of AI performance** at FP4. Can't find 5090 FP4 info.
The limiting factor is memory transfer speeds... Which this is optimized for with its unified memory. No consumer GPU can compete. The exact FLOPS do not matter, as GPU cores are more often than not idle waiting for memory being transferred most of the time
I guess it would make a lot of sense keeping everything local, you could run several LLM's at once or use it for training - but if you don't need it if you can run say 3-5 Jetsons, doing LLMs the same?
bingo!
I replaced my older, big desktop with a Khadas Mind Premium minicomputer in November 2023 and I love it. Added on their Mind Graphics unit in 2024 and love it even more. The Khadas Mind Graphics unit uses an Nvidia GeForce RTX 4060Ti Desktop graphics GPU and has 8GB GDDR6 memory. It is an awesome setup. The computer and graphics unit only cost me $1,640.00 because I got it on Kickstarter when they first start doing their preorders. Might be interested in upgrading to this new Nvidia mini PC but $3k is twice what I spent for mine so will have to see the reviews and specs first, I guess.
It's not a PC. This is basically a mini server and most will use a secondary device to interface with it
@@MrTmansmooth Ah, got it. I was thrown off by the caption that says new mini pc. He should probably clarify that a bit for novices like me.
so, when and where to buy this soon?
The HP Z2 Mini G1a is coming out around the same time. It's based on AMD Srix Halo and will have very similar capabilities (also 128MB of unified memory between the cpus and gpus)
The problem is that NVIDIA is the industry standard for AI because of CUDA, not AMD. Using an AMD workstation for AI is currently not advisable. CUDA is the de facto standard.
@@GaryExplains ROCm is apparently coming along nicely. Yes, CUDA is currently going to be easier but that's not to say that alternatives aren't starting to gain traction. And if you want something that can also be a general purpose PC then the argument can be made that that will be easier with AMD's x86 compatible cores than it will be with Digit's ARM cores. I'm still undecided about which direction I'll take. I worry about NVidia's support for hardware like this - it requires a specially modified Ubuntu 22.04. How long will they support that? It's already 3 years behind.
Nah, an NVIDIA small-form factor PC with 128 GB of RAM & 8 TB SSD with AI and ARM-based chips would absolutely be the correct configuration to sell - the rest is too "also ran" in nature to compete with Apple currently. Apple offers this config on their higher priced MBPs with an M4 Pro chip already. I think people would most certainly buy this mini Windows PC at $3k. Agreed; at that config and price, "it will fly off the shelves"!
what would be the performance against A100 H100 cards?
Disclaimer: We don't know the actual performance, so this is an estimate based on historical trends.
With that said: Generally for this class of product, Nvidia lists a low precision operation (FP4), in a sparse configuration (you are almost certainly not taking advantage of that if you need to ask), so for a lot of realistic use cases you're probably going to want to divide the listed operations by about 8. Ie: You would expect this to perform like a 125 TFLOP GPU.
The 100 has ~80 TFLOPs FP16, but the BF16 performance is probably fairer (because I'm not willing to go to the whitepaper to find the tensor core performance), so 311 TFLOPs we could say.
Based on historical trends, I'm guessing the bandwidth of DIGITS is more similar to the Jetson models than their HBM GPUs, so you would expect somewhere around 125 to 350GB/s of bandwidth with a reasonably high degree of confidence, compared to the 1.5TB/s of the A100.
All in all: If I were looking at buying one, I would expect to get around 1.5x (arguably 1.8x in cases where you were using two A100 40GB cards with some networking losses) to 1/8th the performance of an A100 depending on the specific use case. For low context Transformer inference you'd expect closer to the 1/8th mark, but for convolutional nets it could be anywhere in the 1.5x to 1/3 mark depending on the exact network and its datatypes.
This is assuming existing models, and future ones might be more dependent on FP4 operations, or might take advantage of sparsity in an intentional way, so you could potentially get better performance in the future, but there isn't currently a clear path to models you would want to run using those advancements, so I wouldn't bank on it personally.
@@novantha1 Im not sure about the bandwidth. Would Nvidia call this a "mini AI supercomputer" if the bandwidth will be similar to an Apple Mini with a m4 pro chip or the upcomming strix halo APU? I'm hoping Nvidia hos something more than that up their sleeve.
@@frankjohannessen6383 “Would the company that always hypes up their products with absurd monikers hype up this new product line?”
Yes. Yes they would. They’re calling it a supercomputer in the sense that it can fine tune decently sized language models, not because it’s crazy powerful. They already almost certainly lied about the FLOPs for practical purposes (by giving us sparse operations), so my expectation is that the bandwidth isn’t too crazy. Keep in mind that it’s LPDDR; I’ve never seen an LPDDR system go beyond maybe 200GB/s on the upper end.
@@frankjohannessen6383 “Would the company that always hypes up their products with absurd monikers hype up this new product line?”
Yes. Yes they would. They’re calling it a supercomputer in the sense that it can fine tune decently sized language models, not because it’s crazy powerful. They already almost certainly lied about the FLOPs for practical purposes (by giving us sparse operations), so my expectation is that the bandwidth isn’t too crazy. Keep in mind that it’s LPDDR; I’ve never seen an LPDDR system go beyond maybe 200GB/s on the upper end.
@@novantha1 thank you for taking the time, right now what should be, in your opinion, the option to go for? I was considering System76 x86 workstation with dual RTX 6000 with Ada but now I have to wait for System76 or is there any other alternative? Thanks in advance...
obvious opportunity for AMD to make a 256GB version for 2000$
Won't matter. AMDs software stack for AI work is pretty terrible. See recent news/details by SemiAnalysis (Dec 22 2024 post) where they ran in to multiple roadblocks getting things to work for the MI300X when they were doing performance evals/tests against their nVidia counter parts. nVidia had been investing heavily in CUDA for years, while AMD has not and this won't change any time soon, esp. as nVidia keeps updating and adding to CUDA.
@@shadow7037932 thanks for the pointer but that wasn't my takeaway from the verbose whiny article. I'm sure there are some software issues and that's pathetic because that might be worth 10bn$ or more to AMD to get right so you pay to make that happen at all cost. But I note two key things from the whiny article that should have been a youtube video instead. On 16bit training the AMD card molested both H100 AND H200 on llama 8 and 70b, indeed it seems the larger the model the more AMD shines. And they never tested a model bigger than could fit in H100 memory. Almost as if they are biased little.... The second very important thing I noticed is that they never tested inference which would be large scale user usage in datawarehouses but they say an article is coming and bandwidth is very important for inference. take a wild guess if that means AMD will win there too. A side note is that h200 barely performed better than h100 which is rather puzzling for nvidia but that's another matter. I note that when they use the very recent WIC build, whatever that is, it does rather well. for all the problems they were having with software, it sounds like it's approaching usable and indeed not only comparable but beating nvidia. And they never even tested a model that couldn't fit in H100 memory. That's lame.
For investments as huge as the big companies are making, they should be able to manage whipping the software the last steps to usable. doesn't have to be super elegant to set up before it becomes decisive. And all their testing was in large clusters. They never tested just a single one. AMD's cards might beat nvidia's rather savagely for models larger than can fit in H100 memory. Think about that. They were testing 1.5G params models, no one uses models that small. If the software is bordering on usable AMD should undercut the price savagely, like 5k$ for a 256GB card and murder nvidia. same on consumer cards. and the miniature block
@@shadow7037932 Agreed. Software is arguable the only moat that nvidia has.
Genuine question, why can't the switch 2 be a cut down version of this?
Edit: I just watched till the end, I somewhat understand. But surely cutting this down by an 8th should work out?
what is this computer for? Can i use it in gaming or something other like more casual?
Did you watch the video?
How does this compared to the 5090 @ 2K$ in terms of 32 bit FP ops?
Apples to oranges comparison. 32GB vs 128GB. The 5090 will be faster at 32GB models or smaller, but slower than the GB10 with anything above 32GB as the ram is the limiting factor. Not sure what the memory bandwidth will be for the GB10 though. And models usually are used at max. FP16.
This would be a game changer for large batch training diffusion models, I wonder how hard it will be to get lol.
The cutdown MediaTek Windows version might be a bit more home affordable
4 bits? It's FP4 Floating Point 4,
I think that is generally less accurate for training and better for performance, but if it does FP16 which is generally most common and more accurate it could be a quarter of the performance when training at FP16.
will it be stronger at training or inference?
IMHO this doesn’t realy have any competition in hardware, but it will have feirce competition in software/quantization. Do you realy need a 200B param model? Or can you get by with 8B or 16B parameter quantization that calls experts in the cloud?
What's the I/O on a machine like this? Or is that TBD
So we can run Linux ubuntu and make clusters ?
Yes, NVIDIA mentioned somewhere about using two of these together via ConnectX.
Hello i’m not very knowledgeable about these things but are we sure that you can only connect 2 of these? There are 2 ports I don’t recognize on the back and it seems like if this is a scaled down server architecture linking together multiples of these would be more useful.
1:25 apparently the acronym was also generated with 4-bit precision
And how to connect to this box? Is it Thunderbolt 4 or what? Can it be used with a laptop, for running local LLMs?
This isn't a add-on, it is a complete computer.
SSH
How many TOPS?
1000
@ thanks
It's got 4x the memory of a 5090, but I suspect it'll run slower. Hopefully the 5090 will have hardware 4-bit quantization too.
The 5090 won't even run many of the models that this can. The 5090 is designed for graphics, otherwise it would have a lot more memory.
What applications are available for the Mini PC?
Are you familiar with Linux?
@@GaryExplains Yes.
Then, in terms of available applications, consider it to be the same as any other Arm based Linux computer.
@@GaryExplains Thanks
How does it compare to M4 Max? Can M4 Max also run 200B model locally?
How much RAM in this Mac?
@ It can go upto 128GB. But I wonder how much is the difference in GPU/NPU performance for deep learning in terms of speed.
The ability for an M4 Max based machine to run a 200B models depends on the memory not the GPU/NPU.
@ Yes. But I was referring to pure compute power of M4 Max vs DIGITS GPU because that will determine the token/second for the LLM.
But you asked about a 200B parameter model, you can't run it if you can't load it into memory. Then once you have it in memory, memory bandwidth also a key factor, not just GPU/NPU because you basically need to access 128GB of RAM per token. I am only saying this because you specifically mentioned 200B models.
MediaTek produces ARM based chips, which use less energy, so also produce less heat.
I guess for windows desktop use can connect to it.
And Nvidia produces Arm based chips which use less energy, so also produce less heat. I am still unclear about MediaTek's collaboration in this venture.
I want to see the performance of this mini pc
could this be a low power mining rig?
Why didn't they specify VRAM quantity? Is it unified with the 128GB of DDR5 RAM?
Unified yes.
In my thesis I process large mathematical databases. Do you think this GB10 Mini PC could be used? I have no experience with Linux.
Unfortunately I would need more information to give any kind of useful answer. How large? What type of processing?
Thank you for giving good info on this
Missing power consumption figures
Who cares it's not battery powered, and also self contained
@@MrTmansmooth It's very important for the environment and the pocketbook if you're running LLMs 24x7.
With it being a mini-PC you now have me wondering where the 1kW power brick is located...
Thank you for showing me this I would turn it into a coin farmer and a par part to run games plus stock market learning I just want to get it now before the tariffs hit the mark.
And Gavin Belson unveils the box!
Education pricing ??
You mentioned Windows, so I have a question. Why would you consider putting a Lambo engine in an Edsel?
Looks to me like a dev starting/enter point for what might come in the future for the consumer market. Mediatek is clearly sharing some of their CPU know-how with the Nvidia GPU even they r more known of using off the shelf Arm CPU cores without architectural license they can still optimize them better than Nvidia would do alone. Also lets not forget that Mediatek is a Taiwanese company which r in general close to Jensen, but probably not as close as TSMC😉Collaboration with Mediatek in the past was primarily focused on the automotive industry, however unofficial plan was always to let it to grow further. Would be great to see all of this coming to fruition and getting fruits of this close collaboration in a shape of a consumer Arm SoC utilizing latest Nvidia GPU cores for reasonable price close to M4 Mac Mini running of course both Linux and Windows on Arm. $3K is certainly not targeting mass consumers, however for AI devs and as high performance edge AI machine it might be great option to go with.
This is not a consumer product. If they develop one, the price of this product will not transfer to an entirely different one.
You should have explained the advantage of this over a standard setup with e.g. 4 RTX 5090.
You think 4xRTX 5090 is a "standard setup" 🤐
Maybe they should lookup the alphabet once more :D
But in the video, there are 8x16GB chips next to the SOC itself for the 128GB memory instead of being on the chip itself, the same for the NVME storage, so it's not on the chip, unless you count the board as the chip.
do you reckon you could put steam OS on it?
I see this as also a devkit for their Windows computers, which will probably use a cutdown version of these chips
Its arm
@SWOTHDRA so?
Will wanna see the benchmarks results with windows emulator.
Can it play the latest Indiana Jones PC Game at 4k?
arm cpu so no games
should be able to.
@@Machiavelli2pc lolllll...
Excited for this, but I'm an AI/ML engineer, so it's applicable. Not sure if there would be any other audience for this, but I am on the notify list and will be getting it.
The limitation is obvious. it's implausible to run 30 token/s if you apply Liama-70b-instrucy-4bit, but mere 10token/s
Eh? Where are you getting these numbers from?
@@GaryExplains Llama 3.3 70b-4bit requires GPU to process 40GB. if the bandwidth is less than 500/400GB, the actual performance would be less than 10 token/s.
But we don't know the memory bandwidth. So you are just guessing.
@@GaryExplains it's easier to have educate guess as it's mainly LPDDR5X.
FP4..isnt that kind of nerfed? I would like FP8 min then i might start getting excited
Obviously it can handle FP8 and 16 and 32, we just don't have the performance numbers for those yet.
So aprox 128gb of vram for inference work? So with a quantized model we are looking at an upper limit of, again aprox, 256B parameter model? Is that roughly how that should work?
There is some overhead with GGUF-quantization and you also need RAM for context-window. I'd say an upper limit of about 210B
@@frankjohannessen6383 Thx
@@frankjohannessen6383 Given it was designed to connect to a main PC to off load AI processing I wonder if you can run them in a small cluster? EDIT sorry I just found the following on their website... "The GB10 Superchip enables Project DIGITS to deliver powerful performance using only a standard electrical outlet. Each Project DIGITS features 128GB of unified, coherent memory and up to 4TB of NVMe storage. With the supercomputer, developers can run up to 200-billion-parameter large language models to supercharge AI innovation. In addition, using NVIDIA ConnectX® networking, two Project DIGITS AI supercomputers can be linked to run up to 405-billion-parameter models."
Nvidia themselves state 200B param models in FP4. You can connect 2 of these for 405B
Does this run on Windows or Linux or anything like Windows
Did you watch the video?
@@GaryExplains i warch it but flip through.
I see, if you didn't flip through you would see it runs Linux.
Hope you know that workstation is a euphemism for reinforcement training, as in your working on this but what you are really doing is training your replacement.
So? Imagine the horse trainers stopping car development.....
@ how much do you think ai is useful or good for society? At some point progress is recessive, as a matter of fact look at all the videos of advanced societies in our planet that are receding because people decided they don’t want kids. Thats why all this immigration is happening. There’s a point when people start losing hope because of all the advancements. Not that you have choice is too late, but who do you think has all these machines? Do you think always have everyone’s best interests in mind? I was looking at video of a CIA agent saying that the government has computers that can store all the calls and etc of all Americans for the next 500years if we had equal access to it when the government does something wrong I would be ok with it, but do you think that’s going to be the case?
@@RealAsItGetz This kind of thing will just speed up AI and robots taking over more and more jobs - problem is, as people have pointed out before; the targets for job automation tend to be those with the highest employee numbers - think factory and warehouse workers - and workers who usually demand a fairly high premium for their work (artists, writers, engineers, film production etc) so if too many people end up losing their jobs, who's gonna buy or pay monthly subscriptions to all the products these robots and AI are producing? All these companies rushing to develop and advance AI, automation and robotics don't seem to understand that they're ultimately shooting themselves in the foot in the long run, all for a short-term boost in profits. LLMs and Apps are also gradually stealing away our access and use of the general internet - even Google results are pushing AI summaries instead of having users GO to the websites it finds. We're slowly giving a select few corporations total control over the internet and how we interact with it/what we do on it. NOT a good thing!
@ they know that they are just fighting for that top spot, it’s obvious to me and you. They are actively working for that. They just can’t stop competing with each other.
How many CUDA cores does it have? Wouldn’t 5090 give more compute power to data mine albeit for higher power consumption, as if we are not seeking AI computing performance? Probably a naive question
If you are not looking for AI computing performance then this isn't for you.
If it runs Isaac Sim, it would be awesome.
here you go, that will be 1k per 1gb of vram, the more you buy
If you get the same memory and performance from the $3000 entry level model, that would be bonkers cheap!
Can it run crysis?
This uses an Arm CPU which will probably mean that only NVIDIA's distribution will be compatible. This is probably a good deal if you need too run AI models on it without heating your office.
Will this trump 5090s for image and video generation?
can it run crysis tho?
FP4 is 4-bit floats or 4-byte floats?
bit.