If your compute needs actually call for clustering to increase performance then you should have already exceeded the performance of any maxed out, single node. Budgets far above us normies. The real problem though is apple has had hardware enterprise wanted to use before and not supported that use at all. They target consumers not fleets/data centers. They long ago discontinued their xserves.
@@ianTnai The node topic is just an answer to the rip off mentality in apple pricing policy factoring 6 to 8 times the appropriate pricing for storage and RAM and that an maxed out Mac Mini M4 (doubling Storage and doubling RAM) is more costly than 2 Base Mac Mini M4 (where you get the second PCB, the second PSU, the second case, the second Software license, the second power Cord, the second packaging, the ability to set up a second workspace. So if you factor in all these items according to apples pricing scheme PCB 50$, Case 100$, Software License 100$, Power Cord 30$, Packaging 10$ it adds up to probably 300$ so the maximum acceptable uzpgrade price for 16GB RAM to 32 GB RAM plus Storage from 256 GB to 512 GB would be roughly 300$ in Total, and i bet Upgrade prices of 100$ for each increment (which would be still massively overpriced by factor 4) would be a great success as shelf products (manufactured with a much higher productivity than Customer builds).
EXO adds a lot of overhead which slows down tokens per second responses. Adding more than 4 MAC Minis requires a hub which also slows things down. Maybe EXO will get better, but only time will tell.
Yeah, efficient CPU and GPU muscle stacking but system memory bandwidth is not that great in the mini. Huge bottleneck for AI. I don't think this is really a good option.
I remember another youtuber specifically testing this and at some point he just ran the same LLM on the terminal of ONE MacMini and it was four (4) times faster. Unless the cluster software has drastically improved since then, you're probably right
Anyone tries this… please let us know! Looking to run something along the lines of Llama 3.2 90b FP16…. (It’s that FP16 bit will create the need for 4 MMP (Mac Mini Pro) to create the required near 200 gig memory bank! Very curious… obviously, via EXO et al, you will be able to pool your unified RAM. Question is… do the 4 M4 Pro chips also all get to pitch in for the TPS (tokens per second) number? Or is it all bottlenecked by only utilizing 1 cpu? Yes at FP16 it’s gonna be slow… but all we need is 7.74 tps! 4 would work (though you’d age twice as fast when working with it!)
Classic Apple pricing. Baseline but ok performance for $ - but to get what apple boasts performance, well then, you're looking at $$$$$ !! They must use extortion as their core business model.
You can get three to four base-model Mac Minis for the price of an RTX4090 and these will run much larger models. A bit slower than the RTX4090, but quieter and they draw less power. Ideal for homelabbers with interest in AI.
Imagine Stacking 4 * M4 Ultra Mac Studio 256GB Unified memory (1TB) ... OR 4 * M4 Max Mac Studio 128GB Unified memory (512GB) that's some SERIOUS AI workload and a huge LLM and memory bandwidth 🤣
Here’s the recalculated comparison based on the updated Mac Mini price of ₹60,000 and a more optimized cloud GPU/multi-GPU setup using shared motherboards and minimal storage: 1. M4 Mac Mini Cluster • Price: ₹60,000 per unit. • Number of Units: ~25 units (₹60,000 × 25 = ₹15,00,000). • Specifications per Unit: • CPU: 8-core Apple M2 (4 performance, 4 efficiency cores). • GPU: Integrated 10-core Apple GPU. • RAM: 8GB unified memory. • Storage: 256GB SSD. • Power Consumption: ~40W per unit (~1,000W for 25 units). • Cluster Total: • CPU Cores: 200 cores (8 × 25). • GPU Cores: 250 cores (10 × 25). • RAM: 200GB unified memory (8GB × 25). • Storage: 6.4TB SSD (256GB × 25). • Performance Notes: • Pros: macOS-native optimization, efficient power usage, and great for lightweight AI/ML workloads. • Cons: Limited GPU power for deep learning or robotics. Integrated GPUs are far slower than discrete GPUs. 2. Custom Multi-GPU System with RTX 4090s • Optimized Build: • Motherboard: ASUS TRX40 or equivalent (supports 4 GPUs, ~₹70,000). • CPU: AMD Ryzen Threadripper 3960X (24 cores, ~₹1,00,000). • GPUs: 4× NVIDIA RTX 4090 (~₹1,75,000 per GPU × 4 = ₹7,00,000). • RAM: 64GB DDR4 (~₹30,000). • Storage: 1TB NVMe SSD (~₹8,000; storage for OS and minimal datasets). • PSU: 1600W Gold (~₹30,000). • Chassis and Cooling: ~₹50,000. Total for 4-GPU Setup: ~₹10,00,000. Add-ons: ₹5,00,000 for extra GPUs (e.g., 2 more 4090s or upgrading storage). • Cluster Configuration: • GPUs: 4 RTX 4090s (16,384 CUDA cores each; total 65,536 CUDA cores). • GPU Memory: 96GB GDDR6X (24GB × 4 GPUs). • CPU Power: 24 cores, 48 threads. • RAM: 64GB DDR4. • Storage: 1TB expandable. • Power Consumption: ~1,600W for full load. Performance Comparison Feature Mac Mini Cluster (25 units) Multi-GPU Setup (1 system, 4 GPUs) GPU Power 250 integrated GPU cores 65,536 CUDA cores (RTX 4090 x 4) GPU Memory 200GB unified (shared with CPU) 96GB GDDR6X (dedicated) CPU Power 200 Apple cores 24 high-performance cores RAM 200GB unified 64GB DDR4 (expandable) Storage 6.4TB SSD 1TB NVMe (expandable) Training Speed ~10-15x slower for large models Optimized for deep learning tasks Energy Efficiency ~1,000W ~1,600W Cost Scalability Low (more devices needed) High (add GPUs incrementally) Key Insights 1. GPU Performance: The 4090s dominate in AI/ML tasks due to CUDA core count, memory bandwidth, and optimized software (PyTorch/TensorFlow). 2. Cost Efficiency: A single 4-GPU setup is more cost-efficient for high-end tasks than 25 Mac Minis. 3. Scalability: Add more GPUs to the system later for increased performance without major additional costs. 4. Energy Usage: Mac Mini clusters are more efficient but can’t handle large AI/ML datasets effectively. Recommendation • For High-End AI/ML: Go with the multi-GPU system with RTX 4090s. • For General Workloads/Power Savings: Opt for the Mac Mini cluster.
Is this based on real-world setup or did you just calculate this. I'm mostly curious how you determined the performance difference (such as tokens-per-second) between such setups.
Four base model Mac Minis are about the price of one RTX4090, if you are lucky, with a crappy PC around it to make it actually do something. And an Exo cluster made out of these four Minis can not only run larger models than that PC, but they are also quieter and draw less power.
@@bftjoe I actually said that the PC was needed for the GPUs to do anything. If you are a hobbyist with interest in AI and want to self-host, the Mac Mini is currently great value.
This is EXACTLY what I have been searching for the last 10 days!
@ 0:43 Base Model has TB 4 not TB 5. The quoted example uses not the 600$ Base versiuon but the 1400$ Pro version therefore able to use TB5.
You are right, but even TB4 should be enough for daisy chain.
What I came to say
Thanks!
If your compute needs actually call for clustering to increase performance then you should have already exceeded the performance of any maxed out, single node. Budgets far above us normies.
The real problem though is apple has had hardware enterprise wanted to use before and not supported that use at all. They target consumers not fleets/data centers. They long ago discontinued their xserves.
@@ianTnai The node topic is just an answer to the rip off mentality in apple pricing policy factoring 6 to 8 times the appropriate pricing for storage and RAM and that an maxed out Mac Mini M4 (doubling Storage and doubling RAM) is more costly than 2 Base Mac Mini M4 (where you get the second PCB, the second PSU, the second case, the second Software license, the second power Cord, the second packaging, the ability to set up a second workspace.
So if you factor in all these items according to apples pricing scheme PCB 50$, Case 100$, Software License 100$, Power Cord 30$, Packaging 10$ it adds up to probably 300$ so the maximum acceptable uzpgrade price for 16GB RAM to 32 GB RAM plus Storage from 256 GB to 512 GB would be roughly 300$ in Total, and i bet Upgrade prices of 100$ for each increment (which would be still massively overpriced by factor 4) would be a great success as shelf products (manufactured with a much higher productivity than Customer builds).
You’re not chaining together $600 units that use thunderbolt five. Those units are thunderbolt four.
Apple should compete with Nvidia making racks of AI hardware.
Deploying your own AI is never become easier since the arriving of Ollama and Hanging Face.
hang face brotha🤙
I wanna know what the poster with the price tag says in full!
What they dont realize is theyre putting hot air on top of their minis with this stacked setup
They're aware bro... It's just for a pic
would that same stack turned over horizontal be better?
@@mvargasmoranhorizontal and opposite facing for each bank. Like a server farm
Yeah consuming 3watts idle is a problem /s THEY EFFICIENCY SIR
Drill holes and install case fans.
EXO adds a lot of overhead which slows down tokens per second responses. Adding more than 4 MAC Minis requires a hub which also slows things down. Maybe EXO will get better, but only time will tell.
Yeah, efficient CPU and GPU muscle stacking but system memory bandwidth is not that great in the mini. Huge bottleneck for AI. I don't think this is really a good option.
I remember another youtuber specifically testing this and at some point he just ran the same LLM on the terminal of ONE MacMini and it was four (4) times faster. Unless the cluster software has drastically improved since then, you're probably right
@@TimoBirnscheinexo is slower, mlx is faster but requires more setup.
links to the tweeks and articels man
Waiting for the studio M4 and then well stack
Did you have to use their os/software? Did you have to create an Apple account?
Asahi linux does not yet support M4, but it's on their roadmap. For now you would have to use macOS.
you are wrong, the mini mac for 600$ has not thunderbolt 5. only 4.
does the stacked mac minis work as one computer?
Not as you may think, they create a computing cluster. Each Mac Mini communicates in a collective way with each other to solve a distributed problem.
Anyone tries this… please let us know! Looking to run something along the lines of Llama 3.2 90b FP16…. (It’s that FP16 bit will create the need for 4 MMP (Mac Mini Pro) to create the required near 200 gig memory bank!
Very curious… obviously, via EXO et al, you will be able to pool your unified RAM. Question is… do the 4 M4 Pro chips also all get to pitch in for the TPS (tokens per second) number? Or is it all bottlenecked by only utilizing 1 cpu?
Yes at FP16 it’s gonna be slow… but all we need is 7.74 tps! 4 would work (though you’d age twice as fast when working with it!)
When you say, "all we need is 7.74 tps," what did you mean by that? Why that rate of tokens-per-second specifically?
Expensive way to do it.
How many Rs in Strawberry?
Enough Rs to make an AI question its life choices
so apple there you got your usecase for a new mac pro variant!!!!!
how are you linking them>>>. do you need software to combine the power??
What’s the double pendulum next to you?
Yes AI homelab is cool but this is what I would like to know :-D
swinging sticks
Bullshit it is similar to buying stack of Toyota Priuses to get a new sofa from Ikea...
Classic Apple pricing. Baseline but ok performance for $ - but to get what apple boasts performance, well then, you're looking at $$$$$ !! They must use extortion as their core business model.
Is there still a shortage of Raspberry PI’s? Otherwise, why are people using these? This seems crazy!
Except the Mini with TB5 is $1,400, not $600. Although even with TB4 it's faster than 10gb ethernet.
My mind blown... How rich these people's are🥲
You can get three to four base-model Mac Minis for the price of an RTX4090 and these will run much larger models. A bit slower than the RTX4090, but quieter and they draw less power. Ideal for homelabbers with interest in AI.
Cannot wait for the $1000 mac mini RACK servers!!!!
Racknex of Austria offers rackmount enclosures for the Mac Mini. You can fit two Minis in 1.33U, they also have covers to make it 2U.
Spinal tap
No please stop, these things are while supplies last...
Imagine Stacking
4 * M4 Ultra Mac Studio 256GB Unified memory (1TB) ...
OR
4 * M4 Max Mac Studio 128GB Unified memory (512GB)
that's some SERIOUS AI workload and a huge LLM and memory bandwidth
🤣
Here’s the recalculated comparison based on the updated Mac Mini price of ₹60,000 and a more optimized cloud GPU/multi-GPU setup using shared motherboards and minimal storage:
1. M4 Mac Mini Cluster
• Price: ₹60,000 per unit.
• Number of Units: ~25 units (₹60,000 × 25 = ₹15,00,000).
• Specifications per Unit:
• CPU: 8-core Apple M2 (4 performance, 4 efficiency cores).
• GPU: Integrated 10-core Apple GPU.
• RAM: 8GB unified memory.
• Storage: 256GB SSD.
• Power Consumption: ~40W per unit (~1,000W for 25 units).
• Cluster Total:
• CPU Cores: 200 cores (8 × 25).
• GPU Cores: 250 cores (10 × 25).
• RAM: 200GB unified memory (8GB × 25).
• Storage: 6.4TB SSD (256GB × 25).
• Performance Notes:
• Pros: macOS-native optimization, efficient power usage, and great for lightweight AI/ML workloads.
• Cons: Limited GPU power for deep learning or robotics. Integrated GPUs are far slower than discrete GPUs.
2. Custom Multi-GPU System with RTX 4090s
• Optimized Build:
• Motherboard: ASUS TRX40 or equivalent (supports 4 GPUs, ~₹70,000).
• CPU: AMD Ryzen Threadripper 3960X (24 cores, ~₹1,00,000).
• GPUs: 4× NVIDIA RTX 4090 (~₹1,75,000 per GPU × 4 = ₹7,00,000).
• RAM: 64GB DDR4 (~₹30,000).
• Storage: 1TB NVMe SSD (~₹8,000; storage for OS and minimal datasets).
• PSU: 1600W Gold (~₹30,000).
• Chassis and Cooling: ~₹50,000.
Total for 4-GPU Setup: ~₹10,00,000.
Add-ons: ₹5,00,000 for extra GPUs (e.g., 2 more 4090s or upgrading storage).
• Cluster Configuration:
• GPUs: 4 RTX 4090s (16,384 CUDA cores each; total 65,536 CUDA cores).
• GPU Memory: 96GB GDDR6X (24GB × 4 GPUs).
• CPU Power: 24 cores, 48 threads.
• RAM: 64GB DDR4.
• Storage: 1TB expandable.
• Power Consumption: ~1,600W for full load.
Performance Comparison
Feature Mac Mini Cluster (25 units) Multi-GPU Setup (1 system, 4 GPUs)
GPU Power 250 integrated GPU cores 65,536 CUDA cores (RTX 4090 x 4)
GPU Memory 200GB unified (shared with CPU) 96GB GDDR6X (dedicated)
CPU Power 200 Apple cores 24 high-performance cores
RAM 200GB unified 64GB DDR4 (expandable)
Storage 6.4TB SSD 1TB NVMe (expandable)
Training Speed ~10-15x slower for large models Optimized for deep learning tasks
Energy Efficiency ~1,000W ~1,600W
Cost Scalability Low (more devices needed) High (add GPUs incrementally)
Key Insights
1. GPU Performance: The 4090s dominate in AI/ML tasks due to CUDA core count, memory bandwidth, and optimized software (PyTorch/TensorFlow).
2. Cost Efficiency: A single 4-GPU setup is more cost-efficient for high-end tasks than 25 Mac Minis.
3. Scalability: Add more GPUs to the system later for increased performance without major additional costs.
4. Energy Usage: Mac Mini clusters are more efficient but can’t handle large AI/ML datasets effectively.
Recommendation
• For High-End AI/ML: Go with the multi-GPU system with RTX 4090s.
• For General Workloads/Power Savings: Opt for the Mac Mini cluster.
Is this based on real-world setup or did you just calculate this. I'm mostly curious how you determined the performance difference (such as tokens-per-second) between such setups.
Wow, stack them to make a Ai hub! Who cares? What is an AI hub? Do people really do this? 😂😂😂
This is a waste of money, who even buys apple for anything besides setting their money on fire?
developers
Four base model Mac Minis are about the price of one RTX4090, if you are lucky, with a crappy PC around it to make it actually do something. And an Exo cluster made out of these four Minis can not only run larger models than that PC, but they are also quieter and draw less power.
@20windfisch11 Why would you compare the price to a GPU instead of other desktop/mini pc system? Are you trying to be misleading?
@@bftjoe because a ton of models and importantly Qwen and Deepseek work just fine with a ton of system ram and a fast CPU
@@bftjoe I actually said that the PC was needed for the GPUs to do anything.
If you are a hobbyist with interest in AI and want to self-host, the Mac Mini is currently great value.