NVIDIA's New Mini PC - With the GB10 Grace Blackwell Superchip

Gary Explains

Просмотров 157 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 22 янв 2025

Комментарии • 516

@VassilisPerantzakis 16 дней назад ⁺²⁶³
This is at least 8 times cheaper per teraflop than current solutions for the desktop. I am starting to save for this one...
@electrodacus 16 дней назад ⁺³⁴
Keep in mind is FP4 performance. The only big advantage is the large amount of shared memory. The similar product from AMD is the Strix Halo same 128GB but 16 Zen 5 cores so x86 but likely much lower FP4 performance tho not sure if that is relevant due to limited memory bandwidth.
@cajampa 16 дней назад ⁺⁸
@@electrodacusAs far as I know AMD is only FP8. I mean you can run FP4 on it but you only get a size reduction benefits of the model. But Ngreedia will be WAY faster at FP4
@electrodacus 16 дней назад ⁺¹⁴
@@cajampa Even FP8 AMD performance will be much lower than Nvidia but not sure if any of that will have an impact on large models where memory bandwidth will be the limiting factor. The AMD AI MAX+ has 256GB/s not sure about this Nvidia mini PC. I'm not in a hurry so I will wait to see how they all compare but I think since I need a normal Linux computer and be able to run LLM's the AMD will be better due to x86 compatibility and likely much better price.
@cajampa 16 дней назад ⁺²
@electrodacus Good points bro
@kaptnwelpe5322 16 дней назад ⁺¹¹
Exactly. Best news of CES so far. Can't wait for it. I don't get why Intel didn't release ARC cards with multiple of 48GB of RAM as they can not compete on raw compute power anyway... . HOW do Intel and AMD think they can attract developers to OpenVINO without such offerings? NVidia shows with such offerings WHY they are the market leader...
@NeonNoodleNexus 16 дней назад ⁺⁸⁸
At this price its gonna fly off the shelves. These AI boxes could be in a class of its own going forward.
@NisseOhlsen 13 дней назад
how do you know? what is their capacity compared to, say a 5090?
@NeonNoodleNexus 13 дней назад ⁺¹
@@NisseOhlsen 5090 only has 32GB, but thats designed for gaming and 3d graphics. This includes 128GB and also has a CPU on the same chip, and very likely it'll strip out most of the gaming specific hardware like shaders and ray tracing cores.
@NisseOhlsen 13 дней назад
@@NeonNoodleNexus come on. I'm not a novice, OK? You are confusing 128 GB of RAM with 32 GB of on-GPU VRAM. It's not the same. You understand the difference?
@NeonNoodleNexus 13 дней назад ⁺¹
@@NisseOhlsen This is an SoC. The memory on is on chip. Like a games console. For example the PS5 doesn't have its own separate VRAM. Its all shared.
@NisseOhlsen 13 дней назад
@@NeonNoodleNexus wow, thanks! This isn't a 128 GB GPU, right? So that means the 128 GB are not exactly on the GPU, right (otherwise, no-one would aver buy an H-100 from hereon) ? So what's the difference? Is this a GPU with 128 RAM on-board? if so, what is the latency difference between this setup and a conventional PC with a GPU, and 128 GB of DD4/5 RAM. concrete example: GTX 5090, Thanks!
@ProfessorPrawn 15 дней назад ⁺¹⁶
This is actually democratizing AI. This is huge! This will make some amazing things happen. This is the coolest thing I've heard in years. This will change computing and the way we compute and it will change all of our industries very quickly. My jaw was literally dropped when they announced this!
@lucasdigital 16 дней назад ⁺⁸
Great coverage! It was the part of the presentation that left me with the most questions -- which you mostly answered.
@Nick85 16 дней назад ⁺⁶⁹
For a dedicated AI machine running at that level, 3k honestly isn't that bad. If you need a cheaper AI dedicated system for smart home use, try the Jensen Nano for $250.
@livewireyoutube 15 дней назад ⁺¹
I see what you did there.....😏
@mariomarais5644 14 дней назад ⁺¹
Jenson and the Jetsons!
@OriNachum 13 дней назад
Probably the tiers are:
* Jetson at 250$ and 8GB
* Mac mini M4 starting at 600$ and 16GB
* Increase Mac Mini M4 as needed up to 64GB IIRC
* DIGITS “DB10” with 128GB
Unless Nvidia brings a price drop for Orin NX 16GB of course
@vincentvoillot6365 16 дней назад ⁺⁵⁸
3.000 $ is the actual price for a Jetson AGX Orin 64Gb (6 cores CPU with Ampere GPU, same hardware as the Switch 2).
So for the same price, we have double the RAM and skip one GPU generation ? Forget Windows, with that kind of processing power, QEMU with Proton/Wine could be handle just by a couple of cores.
@Vanastarr 15 дней назад ⁺¹
The CPU won't be very powerful.
@garystinten9339 15 дней назад ⁺²
@@Vanastarrit had 20 cores..
@frankjohannessen6383 15 дней назад ⁺⁸
"starting at 3000$". The 128GB, 4TB, 20 Core, 1PFLOP version will probably be a lot more than 3000$
@garystinten9339 15 дней назад ⁺¹
@@frankjohannessen6383 well NVIDIA made the call.. it's not for me to make sense of it.. just to buy it if the specs are good and the price is right.
@Vanastarr 15 дней назад ⁺¹
@@garystinten9339 Doesn't matter if they are weak. It will be weaker than a 6 core 9600X.
@olternaut 16 дней назад ⁺⁸⁵
Even if I end up having to get multiple units so as to scale, I envision being able to run an AGI model locally. This is an automatic buy and is a much more important product than the 5000 series.
@Machiavelli2pc 16 дней назад ⁺¹⁹
Yep! the future is open source, uncensored, locally run AI/AGI! The future looks bright! :)
@radeksparowski7174 16 дней назад ⁺⁷
amen, hail rokos basilisk
@luisavilez09 15 дней назад ⁺⁴
@@Machiavelli2pcok I am looking forward to seeing this work out individually for those who can afford and pull it off
@morkalan5226 15 дней назад
AI WILL DO SO MUCH FOR HUMANITY WE'RE ON THE BRINK OF AGI, WHICH IS GONNA MEAN: MAKING NUCLEAR FUSION A REAL POWER SOURCE, CURING CANCER AND SO MUCH MORE!!!! SUCH A GREAT GREEN FUTURE AGEAD OF US!!! ITS INCREDIBLE HOW FAR HUMANITY HAS COME WITHOUT ALREADY BLOWING ITSELF UP FIVE TIMES OVER!!! HUMANS ARE SO CRUEL AND SUBJECTIVE, IM SO FOR AI, AFTER HAVING ACHIEVED SINGULARITY, MAKING DECISIONS FOR US HUMANS AND STRAIGHTUP SKIPPING ALL THE PEOPLE WE USED TO VOTE FOR, NOT EVEN TO MENTION DICTATORS!!!
MAN I CANT WAIT FOR UNIVERSAL BASIC INCOME FOR THE WHOLE WORLD!❤❤❤
FINALLY WERE ABOUT TO GET THE SOLUTION OF ALL OF WORLDS PROBLEMS!!!
MAY JENSEN HUANG BLESS GOD!
@ichemnutcracker 16 дней назад ⁺⁵⁷
I find the *"starting at"* part of the price for this to be concerning.
@electrodacus 16 дней назад ⁺¹
Yes. Likely some variants with lower CPU and GPU cores maybe even lower memory.
@GaryExplains 16 дней назад ⁺¹⁵
@electrodacus I think it is more about storage. The base model maybe have 1TB of storage, for $3K, and then more storage means higher price. I doubt we will see any CPU or GPU variants.
@electrodacus 16 дней назад ⁺⁵
@@GaryExplains I'm not sure about that since the manufacturing process for such large chips is not perfect and they will need to sell those that have small defects. Also $3K seems way to good.
@MrTmansmooth 15 дней назад ⁺¹
@@electrodacuswhy would they mention connecting 2 of you could just get a more powerful one?
@theNittyGritty 15 дней назад ⁺³
Yeah, I also think that there will be more variants and not just for the storage. They did it with the Jetson AGX Orin as well. Monetarily this makes total sense when you look at the yields of such powerful chips. Selling otherwise worthless ones for cheaper is perfect. Unfortunately, this means the $3K will just be the starting price, but hey - maybe I'm wrong. Wouldn't mind it, if I am ;-)
@KanielD 16 дней назад ⁺¹³
Most exciting announcement so far
@user-pt1kj5uw3b 15 дней назад ⁺³
Don’t see anyone talking about it, but this thing is beautiful. Really pretty design.
@briankronberg 16 дней назад ⁺¹¹
You missed the teaser that they can connect two of these together.
@MichaelDomer День назад
More than 2
@Emerson1 16 дней назад ⁺²⁰
What is the memory bandwidth, did Nvidia or Mediatek say?
@igottheshaft 16 дней назад ⁺⁴
This is the question.
@BobHannent 14 дней назад ⁺³
The other G-B chips have 512GB/sec. It also mentioned NVLink-C2C which is previously declared at 297GB/sec.
The Register says it appears to have 6 LPDDR5X chips, and they speculate it has 825GB/sec on 8800MT/s.
@vizionthing 16 дней назад ⁺³⁵
Are we beggining to see the extinction of Windows and Intel?
@razorgarf 16 дней назад ⁺²
Maybe but they will start making ARM chips too
@rickjason215 16 дней назад ⁺⁷
Windows can be played on ARM. Why would it end?
@olternaut 16 дней назад ⁺³
Finally! One can only hope.
@kjellbeats 16 дней назад ⁺²
@@rickjason215ehh it can, but many of the programs can't
@digitaleswerken 16 дней назад ⁺³
What we are seeing is the extinction of Windows and Intel in the modern AI data center.
That wont effect the general desktop market much. But we will certainly see a rise of ARM based developer workstation in the future.
@olivetree9920 15 дней назад ⁺³
They must have been fighting like hell to get that acronym
@rejophilipjose7763 14 дней назад ⁺⁵
When the mini pc is faster thet your workstation
@Divergentpath 14 дней назад ⁺²
For perspective the pricepoint for a gateway or dell back in 1997 was 2500.00 dollars. That was for cutting edge tech 33 baud modem and 800 megabyte hard drive with 32 megs of RAM. I think it ran windows 3.2. So for that capability offered now. It is worth it.
@Oss-6342 11 дней назад ⁺¹
Also for perspective, Roadrunner was the first supercomputer to break the petaflop barrier back in 2008. It would have set you back >$100m and you’d need 2 MW of power to run it. Insane.
@michaels3003 10 дней назад
That's about $5000 today (it's called inflation)...
@maddog9213 9 минут назад
what is cooling the thing
@scottmiller2591 15 дней назад ⁺⁴
Way more performant than a 5090, for just a little bit more money.
@SmarthiesBS 12 дней назад
For some application scenarios (big LLMs that need a lot of memory) this is true, for others not: slower unified/shared memory, less raw performance.
But it's good that we can choose 😎 The market is becoming more diverse.
NVIDIA is poaching in some industries/vendors:
- Inference services like OpenRouter and GPU cloud providers
- Apple (similar concept to M4 with unified RAM on Mac Mini and Mac Pro)
- Intel/AMD: Clear decision against x86
With the 5000 series, Nvidia is greedier than ever and has monopolistic business practices. However, the new products around Jetson Orin Super and now the DIGITS system fill very interesting gaps in the lower and middle price segment: Robotic, developer, "private use" like homelabs
There is not only (almost) unaffordable consumer hardware and (at least for me as an individual developer) unaffordable enterprise hardware anymore. We also have the possibility to run larger models locally now.
At a price you can't even pay out of your petty cash, but to be honest, if you're in the target market for these systems, this is a more than fair offer.
I would have liked to see such solutions from AMD, along with a competitive software stack. As it is, AMD will continue to lag behind, but this competition is extremely good for us as customers, and I am sure we will be presented with suitable alternatives in the near future.
I'm also pinning my hopes on mini-PC vendors, who haven't made office PCs for long but are getting more creative in how they assemble their systems.
@olnnn 15 дней назад ⁺⁵
I hope nvidia won't drop support for it after a few years like they tend to do for their jetson boards
@vapourmile 15 дней назад ⁺⁶
@Gary Explains: Two of these benchmarks are misleading and they are both linked to FP4. It's a 4bit floating point number. Almost the smallest possible float (and the smallest which can follow IEEE compatibility features).
It says "a petaflop of performance", but it's a petaflop of 4bit performance.
Worse of the two is "200giga parameter language model". Well, of course, it has 128GB of RAM. With each "parameter" being a single 4bit float you just double the RAM for the total number of FP4s you can have locally. 256 billion.
I definitely think it is a very good idea and a very wise risk from Nvidia. Who knows where this will go? I don't. They don't. They probably aren't planning for it to sell in huge numbers. A small form-factor personal computer for AI sounds great though and injects a spark of life into the long dead "workstation" market which is where Nvidia, who rose from the ashes of SGI, came from. Via the catalyst of Nvidia the since demised SGI has reincarnated and in doing so transformed from a graphics workstation vendor to a high-end AI workstation and supercomputer vendor. SGI didn't die, it became the DGX pod. :)
@JohnnoWaldmann 14 дней назад
Maybe when 32 bit float arrives then it will become game changing in media production.
@GaryExplains 14 дней назад ⁺³
@JohnnoWaldmann What do you mean by "when 32 bit float arrives"? This NVIDIA box supports FP32, it is just that we don't have any performance numbers at the moment.
@JohnnoWaldmann 14 дней назад ⁺¹
@ interesting the audio description repeatedly referred to a 4bit limitation.
@artemhnilov 16 дней назад ⁺²⁸
What about power consumption and cooling that tiny box?
@flrn84791 16 дней назад ⁺⁸
shhhh
@darwinjina 15 дней назад ⁺⁷
more importantly, can it heat your home? lol
@alanmay7929 15 дней назад ⁺²
its a 3nm chip lol... its an soc using the latest node processes.
@louiswilliamterminator2887 15 дней назад ⁺¹
Not made by Intel!
@letsRegulateSociopaths 14 дней назад ⁺²
needs a nuclear power plant.... but they are selling THOSE too!
@666luster666 15 дней назад
Finally, someone who can summarize the basics. TY
@justronny20 16 дней назад ⁺¹
Imagine Qualcomm also scaled their Oryon X cores in a similar manner. That would actually be insane.
@artemhnilov 16 дней назад ⁺⁸
Is performance (not memory size) comparable to upcoming 5090?
@shadow7037932 15 дней назад
Only for FP4.
@nathanbanks2354 15 дней назад
Probably not, though the performance per watt could be better.
@datupload6253 15 дней назад
from chip picture, it looks, it have 48 Streaming procesors (128 CUDA cores in each = 6144) as RTX 5070 so similar performance.. even FP4 performance is same from their presentation. But it's suspicious that a 250W GPU would be in such a small box. I bet inference speed in FP16-FP8 will be 0.5-0.8x speed of RTX 4090. So probably performance as standard RTX 3080 or A6000 but with 128 GB memory. I think that is great alternative to used 48GB cards($3000+), for AI they are pointless now.
@PeterKese 11 дней назад
It's memory bandwidth is less than RTX 4070 (LPDDR5 is slower than GDDR6X)
@X2cao День назад
The memory speed are the concern on low power DDR5X, they typically runs at 8.5Gbps which is very slow.
@truesightgrabber 2 дня назад
Will it be possible to use with Stable Diffision or LM studio or Hunyuan Video ? Will it be faster than my current Nvidia 4070ti ?
@GaryExplains 2 дня назад
We don't know the performance yet.
@g3user1usa 15 дней назад ⁺⁵
Nice, but where's the new Nvidia Shield Pro to replace my 2017 model?
@danieldillbeck6437 9 дней назад
I still have the OG 2015 version and am ready for a new one
@naeemulhoque1777 15 дней назад ⁺³
Thanks Gary for explaining the product. I was watching the Nvidia's livestream. Didn't understand what this thing was.
So basically its a device specifically built to run local llms, right? So now it can take the crown from Mac Studio's for being the more affordable option to run llm locally?
I have one more question, can it be ran as a cluster to run even bigger model than 128gb?
@mairex3803 12 дней назад
Yes it can! You can link up 2 to run 405B models in FP4. I dont see why you could not link up more since it just uses ConnectX.
@nikitaredko2348 2 дня назад
Wow 405b locally 😮
@JoshuaBlais 15 дней назад
Very interesting to see how this comes along - Apple silicon competitors I am all for.
@petereriksson7166 15 дней назад ⁺¹
How does it handle heat ?
@munocat 16 дней назад ⁺⁸
the 3k might be a deal, depends on how well this competes with the apple silicon, does the gpu get the benefits of the complete unified memory like the apple m series?
@electricitybomb 14 дней назад ⁺¹
average dumb apple user, of course it does, and it performs way better than apple silicon
@munocat 14 дней назад
@ ok fan boy
@lale5767 6 дней назад
So these things have an mediatek arm cpu with an nvidia GPU.
How are these related or comparable to super computers?
@var67 16 дней назад ⁺⁴
Are there many useful or widely used models based on FP4 parameters? I really don't know but my guess would be: no. So what does that petaflop headline performance reduce to in FP8? Does it scale linearly with size, so, halved?
@GaryExplains 16 дней назад ⁺⁸
Most of the open source models like Llama and Gemma have 4bit versions. If you run an LLM on your desktop using Ollama it will likely be a 4 bit version. With LM Studio you can opt for which quantized version you download. Same with llama.cpp.
@var67 16 дней назад
@@GaryExplains Aha! So perhaps no need to caveat that performance number. Thanks for the extra info.
@GaryExplains 16 дней назад ⁺¹
The caveat is because the FP8, 16 and 32 performance numbers will be different.
@var67 16 дней назад
@@GaryExplains Right, yes, it needs the qualification because otherwise the number is meaningless, but it doesn't need a *warning* like "OK but fair warning, in practice," etc etc. Apparently it *is* a practical number.
@timetraveler_0 13 дней назад
@@GaryExplains yeah but with 4bit performance drop is significant for precision tasks like coding. I run at 6bit at the very least, preferably at 8. So not sure how useful this box will be. Maybe next gen with faster bandwidth.
@joaquinmadrid3642 15 дней назад ⁺³
For what purposes would you use this Mini PC? Privat or for the company?
@CodyCha 15 дней назад ⁺²
Researchers, AI developer, education and research, small/medium business deploying AI solutions
@Nik.leonard 16 дней назад ⁺⁷
Oh, I was expecting a price in the ballpark of 10 grand.
@EngineerLewis 16 дней назад
me too! That's a bargain for that performance and memory in my opinion!
@frankjohannessen6383 15 дней назад ⁺¹
"prices starting at $3000". My guess is the version with the provided specs will be around $12.000, and the $3000-version will have a scaled down cpu&gpu, 32GB ram and 1TB storage. Won't mind being wrong though.
@jeffwads 15 дней назад ⁺²
This really is a steal going by the specs. But benchmarks are needed. It bugs me that he didn't show any.
@akshatrastogi9063 11 дней назад
How is this mini PC cooled? I don't see an internal fan in the picture..
@weakbit633 6 дней назад
When came the GB20? I can't afford the GB10 for USD3000.- I need 2pcs this is my issue.
@LookingForSomethingMissing 16 дней назад ⁺³
Can it run stable diffusion?
@Zeroduckies 15 дней назад
Of course
@ElementX32 10 дней назад
Yeah, I'm going to get two of these. This should be interesting.
@rick-lj9pc 16 дней назад ⁺²
An interesting comparison would be against a desktop with high end graphics card like a 4090, yes the form factor isn't as nice but I wonder about price to performance ratio.
@122333Jordan 15 дней назад
The limiting factor there is the VRAM. You have to fit your model into 24gb. With this, even if the computation is slower overall, the fact that you can even run a 128gb model in this context is amazing. I can wait an extra couple seconds and access something the 4090 isn't capable of.
@mairex3803 12 дней назад ⁺¹
1. To run the same models, you would need 4x 5090 (32gb x 4). Which would be crazy expensive, both upfront and energy comsumption wise.
2. Even though clock rates might be slower, this is never the limiting factor in AI systems. The limiting factor is memory transfer speeds (which this is optimized for). In both training and inference, gpu cores are idle more often than not 'waiting' for data to be transferred.
@rick-lj9pc 12 дней назад
@@mairex3803 and 122333That is the kind of comparison I was looking for- while I am not currently planning on training an AI systems that shows the value of the mini pc.
@VaKU. 15 дней назад ⁺³
At 3:10, the video demonstrates that the memory isn't integrated into the chip but consists of 8 (or 16?) LPDDR5x modules surrounding the SoC.
@SWOTHDRA 15 дней назад ⁺²
Wow thats bad, the m4 max will wipe the floor with this
@LucidThought 16 дней назад ⁺²
It would be good if you could connect this to your 5090 directly to get it to help 'build your AI' then run it standalone. Maybe NVlink over one of or instead one of the 3 DP connections?
@MA-jz4yc 15 дней назад
I want to know how powerful the built in blackwell GPU is. I'm guessing it exceeds the 5090 in terms of compute
@sorucrab 15 дней назад
@@MA-jz4yc yes the Grace Blackwell GPU is 1TFLOPS. 5090 is a little less than that so almost the same but with the extra unified memory.
@LucidThought 15 дней назад
@@sorucrab "The RTX 5090 boasts **3,352 trillion AI operations" hopefully it's 'apples to apples" but I also found "GB10 Grace Blackwell Superchip delivers **up to 1 petaflop of AI performance** at FP4"
I'm sure the 5090 would be useful in certain cases - maybe FP8 based.
Gonna buy both either way..
@LucidThought 15 дней назад
@@MA-jz4yc GB10 Grace Blackwell Superchip delivers **up to 1 petaflop of AI performance** at FP4. Can't find 5090 FP4 info.
@mairex3803 12 дней назад ⁺¹
The limiting factor is memory transfer speeds... Which this is optimized for with its unified memory. No consumer GPU can compete. The exact FLOPS do not matter, as GPU cores are more often than not idle waiting for memory being transferred most of the time
@PerFeldvoss 15 дней назад
I guess it would make a lot of sense keeping everything local, you could run several LLM's at once or use it for training - but if you don't need it if you can run say 3-5 Jetsons, doing LLMs the same?
@imdaddio 12 дней назад
bingo!
@ericspear3610 16 дней назад
I replaced my older, big desktop with a Khadas Mind Premium minicomputer in November 2023 and I love it. Added on their Mind Graphics unit in 2024 and love it even more. The Khadas Mind Graphics unit uses an Nvidia GeForce RTX 4060Ti Desktop graphics GPU and has 8GB GDDR6 memory. It is an awesome setup. The computer and graphics unit only cost me $1,640.00 because I got it on Kickstarter when they first start doing their preorders. Might be interested in upgrading to this new Nvidia mini PC but $3k is twice what I spent for mine so will have to see the reviews and specs first, I guess.
@MrTmansmooth 15 дней назад ⁺²
It's not a PC. This is basically a mini server and most will use a secondary device to interface with it
@ericspear3610 15 дней назад ⁺¹
@@MrTmansmooth Ah, got it. I was thrown off by the caption that says new mini pc. He should probably clarify that a bit for novices like me.
@Hurrablus 9 дней назад
so, when and where to buy this soon?
@PhilTomson 11 дней назад
The HP Z2 Mini G1a is coming out around the same time. It's based on AMD Srix Halo and will have very similar capabilities (also 128MB of unified memory between the cpus and gpus)
@GaryExplains 10 дней назад
The problem is that NVIDIA is the industry standard for AI because of CUDA, not AMD. Using an AMD workstation for AI is currently not advisable. CUDA is the de facto standard.
@PhilTomson 9 дней назад
@@GaryExplains ROCm is apparently coming along nicely. Yes, CUDA is currently going to be easier but that's not to say that alternatives aren't starting to gain traction. And if you want something that can also be a general purpose PC then the argument can be made that that will be easier with AMD's x86 compatible cores than it will be with Digit's ARM cores. I'm still undecided about which direction I'll take. I worry about NVidia's support for hardware like this - it requires a specially modified Ubuntu 22.04. How long will they support that? It's already 3 years behind.
@martyparsons9419 15 дней назад ⁺²
Nah, an NVIDIA small-form factor PC with 128 GB of RAM & 8 TB SSD with AI and ARM-based chips would absolutely be the correct configuration to sell - the rest is too "also ran" in nature to compete with Apple currently. Apple offers this config on their higher priced MBPs with an M4 Pro chip already. I think people would most certainly buy this mini Windows PC at $3k. Agreed; at that config and price, "it will fly off the shelves"!
@thedevmachine 16 дней назад ⁺¹¹
what would be the performance against A100 H100 cards?
@novantha1 15 дней назад ⁺⁶
Disclaimer: We don't know the actual performance, so this is an estimate based on historical trends.
With that said: Generally for this class of product, Nvidia lists a low precision operation (FP4), in a sparse configuration (you are almost certainly not taking advantage of that if you need to ask), so for a lot of realistic use cases you're probably going to want to divide the listed operations by about 8. Ie: You would expect this to perform like a 125 TFLOP GPU.
The 100 has ~80 TFLOPs FP16, but the BF16 performance is probably fairer (because I'm not willing to go to the whitepaper to find the tensor core performance), so 311 TFLOPs we could say.
Based on historical trends, I'm guessing the bandwidth of DIGITS is more similar to the Jetson models than their HBM GPUs, so you would expect somewhere around 125 to 350GB/s of bandwidth with a reasonably high degree of confidence, compared to the 1.5TB/s of the A100.
All in all: If I were looking at buying one, I would expect to get around 1.5x (arguably 1.8x in cases where you were using two A100 40GB cards with some networking losses) to 1/8th the performance of an A100 depending on the specific use case. For low context Transformer inference you'd expect closer to the 1/8th mark, but for convolutional nets it could be anywhere in the 1.5x to 1/3 mark depending on the exact network and its datatypes.
This is assuming existing models, and future ones might be more dependent on FP4 operations, or might take advantage of sparsity in an intentional way, so you could potentially get better performance in the future, but there isn't currently a clear path to models you would want to run using those advancements, so I wouldn't bank on it personally.
@frankjohannessen6383 15 дней назад
@@novantha1 Im not sure about the bandwidth. Would Nvidia call this a "mini AI supercomputer" if the bandwidth will be similar to an Apple Mini with a m4 pro chip or the upcomming strix halo APU? I'm hoping Nvidia hos something more than that up their sleeve.
@novantha1 15 дней назад
@@frankjohannessen6383 “Would the company that always hypes up their products with absurd monikers hype up this new product line?”
Yes. Yes they would. They’re calling it a supercomputer in the sense that it can fine tune decently sized language models, not because it’s crazy powerful. They already almost certainly lied about the FLOPs for practical purposes (by giving us sparse operations), so my expectation is that the bandwidth isn’t too crazy. Keep in mind that it’s LPDDR; I’ve never seen an LPDDR system go beyond maybe 200GB/s on the upper end.
@novantha1 15 дней назад
@@frankjohannessen6383 “Would the company that always hypes up their products with absurd monikers hype up this new product line?”
Yes. Yes they would. They’re calling it a supercomputer in the sense that it can fine tune decently sized language models, not because it’s crazy powerful. They already almost certainly lied about the FLOPs for practical purposes (by giving us sparse operations), so my expectation is that the bandwidth isn’t too crazy. Keep in mind that it’s LPDDR; I’ve never seen an LPDDR system go beyond maybe 200GB/s on the upper end.
@carvierdotdev 14 дней назад
@@novantha1 thank you for taking the time, right now what should be, in your opinion, the option to go for? I was considering System76 x86 workstation with dual RTX 6000 with Ada but now I have to wait for System76 or is there any other alternative? Thanks in advance...
@DanFrederiksen 16 дней назад ⁺⁸
obvious opportunity for AMD to make a 256GB version for 2000$
@shadow7037932 15 дней назад ⁺⁴
Won't matter. AMDs software stack for AI work is pretty terrible. See recent news/details by SemiAnalysis (Dec 22 2024 post) where they ran in to multiple roadblocks getting things to work for the MI300X when they were doing performance evals/tests against their nVidia counter parts. nVidia had been investing heavily in CUDA for years, while AMD has not and this won't change any time soon, esp. as nVidia keeps updating and adding to CUDA.
@DanFrederiksen 15 дней назад
@@shadow7037932 thanks for the pointer but that wasn't my takeaway from the verbose whiny article. I'm sure there are some software issues and that's pathetic because that might be worth 10bn$ or more to AMD to get right so you pay to make that happen at all cost. But I note two key things from the whiny article that should have been a youtube video instead. On 16bit training the AMD card molested both H100 AND H200 on llama 8 and 70b, indeed it seems the larger the model the more AMD shines. And they never tested a model bigger than could fit in H100 memory. Almost as if they are biased little.... The second very important thing I noticed is that they never tested inference which would be large scale user usage in datawarehouses but they say an article is coming and bandwidth is very important for inference. take a wild guess if that means AMD will win there too. A side note is that h200 barely performed better than h100 which is rather puzzling for nvidia but that's another matter. I note that when they use the very recent WIC build, whatever that is, it does rather well. for all the problems they were having with software, it sounds like it's approaching usable and indeed not only comparable but beating nvidia. And they never even tested a model that couldn't fit in H100 memory. That's lame.
For investments as huge as the big companies are making, they should be able to manage whipping the software the last steps to usable. doesn't have to be super elegant to set up before it becomes decisive. And all their testing was in large clusters. They never tested just a single one. AMD's cards might beat nvidia's rather savagely for models larger than can fit in H100 memory. Think about that. They were testing 1.5G params models, no one uses models that small. If the software is bordering on usable AMD should undercut the price savagely, like 5k$ for a 256GB card and murder nvidia. same on consumer cards. and the miniature block
@mairex3803 12 дней назад
@@shadow7037932 Agreed. Software is arguable the only moat that nvidia has.
@theoofth 15 дней назад
Genuine question, why can't the switch 2 be a cut down version of this?
Edit: I just watched till the end, I somewhat understand. But surely cutting this down by an 8th should work out?
@josefrichtar8900 12 дней назад
what is this computer for? Can i use it in gaming or something other like more casual?
@GaryExplains 12 дней назад
Did you watch the video?
@fod1202 11 дней назад
How does this compared to the 5090 @ 2K$ in terms of 32 bit FP ops?
@imqqmi 10 дней назад
Apples to oranges comparison. 32GB vs 128GB. The 5090 will be faster at 32GB models or smaller, but slower than the GB10 with anything above 32GB as the ram is the limiting factor. Not sure what the memory bandwidth will be for the GB10 though. And models usually are used at max. FP16.
@neonlost 16 дней назад
This would be a game changer for large batch training diffusion models, I wonder how hard it will be to get lol.
@jelliott3604 15 дней назад ⁺¹
The cutdown MediaTek Windows version might be a bit more home affordable
@Mindset-132-Hz 13 дней назад
4 bits? It's FP4 Floating Point 4,
I think that is generally less accurate for training and better for performance, but if it does FP16 which is generally most common and more accurate it could be a quarter of the performance when training at FP16.
@sanketpatel6258 15 дней назад
will it be stronger at training or inference?
@skyak4493 16 дней назад ⁺²
IMHO this doesn’t realy have any competition in hardware, but it will have feirce competition in software/quantization. Do you realy need a 200B param model? Or can you get by with 8B or 16B parameter quantization that calls experts in the cloud?
@jayglookr 12 дней назад
What's the I/O on a machine like this? Or is that TBD
@santhoshagencies418 День назад
So we can run Linux ubuntu and make clusters ?
@GaryExplains День назад
Yes, NVIDIA mentioned somewhere about using two of these together via ConnectX.
@bronzstar482 15 дней назад
Hello i’m not very knowledgeable about these things but are we sure that you can only connect 2 of these? There are 2 ports I don’t recognize on the back and it seems like if this is a scaled down server architecture linking together multiples of these would be more useful.
@kasimirdenhertog3516 11 дней назад
1:25 apparently the acronym was also generated with 4-bit precision
@b.c.2177 15 дней назад
And how to connect to this box? Is it Thunderbolt 4 or what? Can it be used with a laptop, for running local LLMs?
@GaryExplains 15 дней назад ⁺²
This isn't a add-on, it is a complete computer.
@mairex3803 12 дней назад
SSH
@MrMaguuuuuuuuu 15 дней назад ⁺¹
How many TOPS?
@dedlunch 11 дней назад
1000
@MrMaguuuuuuuuu 11 дней назад
@ thanks
@nathanbanks2354 15 дней назад
It's got 4x the memory of a 5090, but I suspect it'll run slower. Hopefully the 5090 will have hardware 4-bit quantization too.
@tschorsch 15 дней назад ⁺¹
The 5090 won't even run many of the models that this can. The 5090 is designed for graphics, otherwise it would have a lot more memory.
@SRK-I 13 дней назад
What applications are available for the Mini PC?
@GaryExplains 13 дней назад
Are you familiar with Linux?
@SRK-I 13 дней назад
@@GaryExplains Yes.
@GaryExplains 13 дней назад
Then, in terms of available applications, consider it to be the same as any other Arm based Linux computer.
@SRK-I 13 дней назад
@@GaryExplains Thanks
@DK-ox7ze 13 дней назад
How does it compare to M4 Max? Can M4 Max also run 200B model locally?
@GaryExplains 13 дней назад
How much RAM in this Mac?
@DK-ox7ze 13 дней назад
@ It can go upto 128GB. But I wonder how much is the difference in GPU/NPU performance for deep learning in terms of speed.
@GaryExplains 13 дней назад
The ability for an M4 Max based machine to run a 200B models depends on the memory not the GPU/NPU.
@DK-ox7ze 13 дней назад
@ Yes. But I was referring to pure compute power of M4 Max vs DIGITS GPU because that will determine the token/second for the LLM.
@GaryExplains 13 дней назад
But you asked about a 200B parameter model, you can't run it if you can't load it into memory. Then once you have it in memory, memory bandwidth also a key factor, not just GPU/NPU because you basically need to access 128GB of RAM per token. I am only saying this because you specifically mentioned 200B models.
@vilijanac 15 дней назад
MediaTek produces ARM based chips, which use less energy, so also produce less heat.
I guess for windows desktop use can connect to it.
@GaryExplains 15 дней назад ⁺¹
And Nvidia produces Arm based chips which use less energy, so also produce less heat. I am still unclear about MediaTek's collaboration in this venture.
@fFrage01 16 дней назад ⁺¹
I want to see the performance of this mini pc
@coltonarendt 15 дней назад ⁺¹
could this be a low power mining rig?
@nickostrom7451 15 дней назад
Why didn't they specify VRAM quantity? Is it unified with the 128GB of DDR5 RAM?
@122333Jordan 15 дней назад ⁺¹
Unified yes.
@playerpatrik2516 14 дней назад
In my thesis I process large mathematical databases. Do you think this GB10 Mini PC could be used? I have no experience with Linux.
@GaryExplains 14 дней назад
Unfortunately I would need more information to give any kind of useful answer. How large? What type of processing?
@olecram3544 15 дней назад ⁺¹
Thank you for giving good info on this
@daviddorval402 16 дней назад ⁺⁸
Missing power consumption figures
@MrTmansmooth 15 дней назад ⁺¹
Who cares it's not battery powered, and also self contained
@tschorsch 15 дней назад
@@MrTmansmooth It's very important for the environment and the pocketbook if you're running LLMs 24x7.
@BobHannent 14 дней назад
With it being a mini-PC you now have me wondering where the 1kW power brick is located...
@curtiskurtzweil 15 дней назад
Thank you for showing me this I would turn it into a coin farmer and a par part to run games plus stock market learning I just want to get it now before the tariffs hit the mark.
@flamebreakk 15 дней назад
And Gavin Belson unveils the box!
@sherritaylor8450 11 дней назад
Education pricing ??
@lpc1231000 13 дней назад
You mentioned Windows, so I have a question. Why would you consider putting a Lambo engine in an Edsel?
@D.u.d.e.r 11 дней назад ⁺¹
Looks to me like a dev starting/enter point for what might come in the future for the consumer market. Mediatek is clearly sharing some of their CPU know-how with the Nvidia GPU even they r more known of using off the shelf Arm CPU cores without architectural license they can still optimize them better than Nvidia would do alone. Also lets not forget that Mediatek is a Taiwanese company which r in general close to Jensen, but probably not as close as TSMC😉Collaboration with Mediatek in the past was primarily focused on the automotive industry, however unofficial plan was always to let it to grow further. Would be great to see all of this coming to fruition and getting fruits of this close collaboration in a shape of a consumer Arm SoC utilizing latest Nvidia GPU cores for reasonable price close to M4 Mac Mini running of course both Linux and Windows on Arm. $3K is certainly not targeting mass consumers, however for AI devs and as high performance edge AI machine it might be great option to go with.
@michaels3003 10 дней назад
This is not a consumer product. If they develop one, the price of this product will not transfer to an entirely different one.
@Minotaurus007 14 дней назад
You should have explained the advantage of this over a standard setup with e.g. 4 RTX 5090.
@GaryExplains 14 дней назад
You think 4xRTX 5090 is a "standard setup" 🤐
@0xphk 15 дней назад ⁺¹
Maybe they should lookup the alphabet once more :D
But in the video, there are 8x16GB chips next to the SOC itself for the 128GB memory instead of being on the chip itself, the same for the NVME storage, so it's not on the chip, unless you count the board as the chip.
@eldraque4556 15 дней назад
do you reckon you could put steam OS on it?
@driver7227 15 дней назад ⁺¹
I see this as also a devkit for their Windows computers, which will probably use a cutdown version of these chips
@SWOTHDRA 15 дней назад
Its arm
@glamoagency5642 14 дней назад
@SWOTHDRA so?
@Engagenumberone 11 дней назад
Will wanna see the benchmarks results with windows emulator.
@dilip.rajkumar 16 дней назад ⁺⁴
Can it play the latest Indiana Jones PC Game at 4k?
@OrientalStories 16 дней назад ⁺²
arm cpu so no games
@Machiavelli2pc 16 дней назад
should be able to.
@slumpedmage 15 дней назад
@@Machiavelli2pc lolllll...
@ryanstory1642 11 дней назад
Excited for this, but I'm an AI/ML engineer, so it's applicable. Not sure if there would be any other audience for this, but I am on the notify list and will be getting it.
@AudieLeon-356 13 дней назад
The limitation is obvious. it's implausible to run 30 token/s if you apply Liama-70b-instrucy-4bit, but mere 10token/s
@GaryExplains 13 дней назад
Eh? Where are you getting these numbers from?
@AudieLeon-356 13 дней назад
@@GaryExplains Llama 3.3 70b-4bit requires GPU to process 40GB. if the bandwidth is less than 500/400GB, the actual performance would be less than 10 token/s.
@GaryExplains 13 дней назад
But we don't know the memory bandwidth. So you are just guessing.
@AudieLeon-356 13 дней назад
@@GaryExplains it's easier to have educate guess as it's mainly LPDDR5X.
@Sl15555 14 дней назад
FP4..isnt that kind of nerfed? I would like FP8 min then i might start getting excited
@GaryExplains 14 дней назад ⁺¹
Obviously it can handle FP8 and 16 and 32, we just don't have the performance numbers for those yet.
@Gary_E_Anderson 16 дней назад
So aprox 128gb of vram for inference work? So with a quantized model we are looking at an upper limit of, again aprox, 256B parameter model? Is that roughly how that should work?
@frankjohannessen6383 15 дней назад ⁺¹
There is some overhead with GGUF-quantization and you also need RAM for context-window. I'd say an upper limit of about 210B
@Gary_E_Anderson 14 дней назад
@@frankjohannessen6383 Thx
@Gary_E_Anderson 14 дней назад
@@frankjohannessen6383 Given it was designed to connect to a main PC to off load AI processing I wonder if you can run them in a small cluster? EDIT sorry I just found the following on their website... "The GB10 Superchip enables Project DIGITS to deliver powerful performance using only a standard electrical outlet. Each Project DIGITS features 128GB of unified, coherent memory and up to 4TB of NVMe storage. With the supercomputer, developers can run up to 200-billion-parameter large language models to supercharge AI innovation. In addition, using NVIDIA ConnectX® networking, two Project DIGITS AI supercomputers can be linked to run up to 405-billion-parameter models."
@mairex3803 12 дней назад ⁺¹
Nvidia themselves state 200B param models in FP4. You can connect 2 of these for 405B
@jehanc 6 дней назад
Does this run on Windows or Linux or anything like Windows
@GaryExplains 6 дней назад
Did you watch the video?
@jehanc 6 дней назад
@@GaryExplains i warch it but flip through.
@GaryExplains 5 дней назад ⁺¹
I see, if you didn't flip through you would see it runs Linux.
@RealAsItGetz 16 дней назад ⁺¹
Hope you know that workstation is a euphemism for reinforcement training, as in your working on this but what you are really doing is training your replacement.
@SWOTHDRA 15 дней назад
So? Imagine the horse trainers stopping car development.....
@RealAsItGetz 15 дней назад ⁺¹
@ how much do you think ai is useful or good for society? At some point progress is recessive, as a matter of fact look at all the videos of advanced societies in our planet that are receding because people decided they don’t want kids. Thats why all this immigration is happening. There’s a point when people start losing hope because of all the advancements. Not that you have choice is too late, but who do you think has all these machines? Do you think always have everyone’s best interests in mind? I was looking at video of a CIA agent saying that the government has computers that can store all the calls and etc of all Americans for the next 500years if we had equal access to it when the government does something wrong I would be ok with it, but do you think that’s going to be the case?
@DavidStruveDesigns 15 дней назад
@@RealAsItGetz This kind of thing will just speed up AI and robots taking over more and more jobs - problem is, as people have pointed out before; the targets for job automation tend to be those with the highest employee numbers - think factory and warehouse workers - and workers who usually demand a fairly high premium for their work (artists, writers, engineers, film production etc) so if too many people end up losing their jobs, who's gonna buy or pay monthly subscriptions to all the products these robots and AI are producing? All these companies rushing to develop and advance AI, automation and robotics don't seem to understand that they're ultimately shooting themselves in the foot in the long run, all for a short-term boost in profits. LLMs and Apps are also gradually stealing away our access and use of the general internet - even Google results are pushing AI summaries instead of having users GO to the websites it finds. We're slowly giving a select few corporations total control over the internet and how we interact with it/what we do on it. NOT a good thing!
@RealAsItGetz 15 дней назад ⁺¹
@ they know that they are just fighting for that top spot, it’s obvious to me and you. They are actively working for that. They just can’t stop competing with each other.
@chaosphoenixhex 7 дней назад
How many CUDA cores does it have? Wouldn’t 5090 give more compute power to data mine albeit for higher power consumption, as if we are not seeking AI computing performance? Probably a naive question
@GaryExplains 7 дней назад ⁺¹
If you are not looking for AI computing performance then this isn't for you.
@wavecast_podcasts 15 дней назад
If it runs Isaac Sim, it would be awesome.
@OrientalStories 16 дней назад ⁺¹
here you go, that will be 1k per 1gb of vram, the more you buy
@mikekwarner 16 дней назад
If you get the same memory and performance from the $3000 entry level model, that would be bonkers cheap!
@adnan-khan 14 дней назад
Can it run crysis?
@tschorsch 15 дней назад
This uses an Arm CPU which will probably mean that only NVIDIA's distribution will be compatible. This is probably a good deal if you need too run AI models on it without heating your office.
@TomNook. 11 дней назад
Will this trump 5090s for image and video generation?
@Battler624 15 дней назад
can it run crysis tho?
@uuidaeiouyw 15 дней назад
FP4 is 4-bit floats or 4-byte floats?
@BillBroadley 15 дней назад
bit.

Следующие

Автовоспроизведение