00:04 Cerebras aims to revolutionize AI compute with a co-designed architecture 02:06 Architecture focused on neural networks 06:25 Memory bandwidth enables full performance in neural network computation. 08:36 Cerebras core hardware architecture flexibility 13:08 Cerebras chip has 84 die with 850,000 cores on a single 300mm wafer. 15:27 Homogeneous array of cores across the wafer for unprecedented fabric performance 19:21 Cerebras architecture utilizes dataflow mechanisms for weight computations 21:12 Single chip enables high-performance neural networks 25:02 Scalable clustering and wafer-scale chips enable large model access to everyone Crafted by Merlin AI.
Hi. There's something that a few people were wondering about: Why is the Wafer-Scale Engine square? Since it looks like there's room for ~28 more complete, attached tiles.
It's a good question! The answer is rather prosaic, we're afraid. If the WSE weren't rectangular, the complexity of power delivery, I/O, mechanical integrity and cooling become much more difficult, to the point of impracticality. Take a look at the virtual teardown on our website and you may get a feel for some of these challenges: www.cerebras.net/cs2virtualtour The upshot is that a mere 850,00 cores will just have to suffice. ;)
Incredible work. How do you scale a trained model down so that you can put it in something smaller and run inference real-time for control of a system?
Hi Ralph, good question. The vast bulk of our customers have used our systems for training LLMs or for HPC applications. We have had a couple of projects using it for inference, like one with Lawrence Livermore National Laboratory where they offloaded an unwieldy inference step from many nodes of their Lassen supercomputer to one of our systems. You can read the case study here: www.cerebras.net/cerebras-customer-spotlight-overview/spotlight-lawrence-livermore-national-laboratory/ But in principle, our architecture should make at terrific concurrent inference platform because we can run many (hundreds or even thousands depending on the model) in parallel across our massive array of cores.
Re: the The die-to-die interface at about 15:15. You mentioned you an upper metal layer to cross the scribe lines between the dies. What does the reticle look like for this. Is this a regular mask, but the alignment for the mask is just offset so it straddles the scribe lines for the rest of the wafer? Is this something TSMC does regularly for other products? Or is this a new process to have reticles on the same wafer that don’t align on top of each other?
Was wondering if memory x is actually an independent device outside of wse-2 , wafer ,? the fact it has better spars performance in hardware level , is very interesting?
@@hg6996 I guess you should be able to interconnect them and split the model on them but then you are introducing the same complexities Nvidia has, taking away Cerebras' main advantage
Very interesting presentation, thx
00:04 Cerebras aims to revolutionize AI compute with a co-designed architecture
02:06 Architecture focused on neural networks
06:25 Memory bandwidth enables full performance in neural network computation.
08:36 Cerebras core hardware architecture flexibility
13:08 Cerebras chip has 84 die with 850,000 cores on a single 300mm wafer.
15:27 Homogeneous array of cores across the wafer for unprecedented fabric performance
19:21 Cerebras architecture utilizes dataflow mechanisms for weight computations
21:12 Single chip enables high-performance neural networks
25:02 Scalable clustering and wafer-scale chips enable large model access to everyone
Crafted by Merlin AI.
Hi. There's something that a few people were wondering about: Why is the Wafer-Scale Engine square? Since it looks like there's room for ~28 more complete, attached tiles.
It's a good question! The answer is rather prosaic, we're afraid. If the WSE weren't rectangular, the complexity of power delivery, I/O, mechanical integrity and cooling become much more difficult, to the point of impracticality.
Take a look at the virtual teardown on our website and you may get a feel for some of these challenges: www.cerebras.net/cs2virtualtour
The upshot is that a mere 850,00 cores will just have to suffice. ;)
@@CerebrasSystems I think I get the idea, thanks.
@@CerebrasSystems Would it be possible to lop off some of those edge tiles to make mini engines?
I didn't catch that much from the routing protocol, and how actually die to communicate on wse2 , yiu guys have alot if things , congratulations 🎊 😊
Incredible work. How do you scale a trained model down so that you can put it in something smaller and run inference real-time for control of a system?
Is the CS-2 used only for training?
Will a time come when, for massively concurrent inference, this architecture will be applicable?
Hi Ralph, good question. The vast bulk of our customers have used our systems for training LLMs or for HPC applications.
We have had a couple of projects using it for inference, like one with Lawrence Livermore National Laboratory where they offloaded an unwieldy inference step from many nodes of their Lassen supercomputer to one of our systems. You can read the case study here: www.cerebras.net/cerebras-customer-spotlight-overview/spotlight-lawrence-livermore-national-laboratory/
But in principle, our architecture should make at terrific concurrent inference platform because we can run many (hundreds or even thousands depending on the model) in parallel across our massive array of cores.
Re: the The die-to-die interface at about 15:15.
You mentioned you an upper metal layer to cross the scribe lines between the dies. What does the reticle look like for this. Is this a regular mask, but the alignment for the mask is just offset so it straddles the scribe lines for the rest of the wafer? Is this something TSMC does regularly for other products? Or is this a new process to have reticles on the same wafer that don’t align on top of each other?
Aloha and thanks! Way to go! Just imagined what you will be doing in ten years from now! Do you have a public roadmap?
Thanks, 808 Big Island! Sadly, no public roadmap. You'll just have to keep watching!
Was wondering if memory x is actually an independent device outside of wse-2 , wafer ,? the fact it has better spars performance in hardware level , is very interesting?
What is the yield of that wafer sir ? thank you
Super fast, lighting speed AI system. Great!
👀👀👀👀👀👀
If this wse is really that good why is still nobody talking about Cerebras AI while Nvidia is still printing money?
Because they are f-u-c-k up
Because todays biggest models dont fit on one Cerebras chip
@@Marqui17 Hmm. So it's not possible to put together more of them in order to make the models fit on such a system?
@@hg6996 I guess you should be able to interconnect them and split the model on them but then you are introducing the same complexities Nvidia has, taking away Cerebras' main advantage