Please do a vid on the company out of dubai who is creating medical AGI and what kinda of technology they are using and what they plan to do. Medical AGI is very broad. Would love to see what that means.
Linus tech tips is all about it, with excellent up-to-date articles on top of the edge technology, and many other excellent channels, BTW. It's just a tip. Perhaps you are not aware of. Good luck. 🎉❤
Your content is always very special and informative. You tend to choose topics that are not commonly found on other channels. The most important thing is the way you explain complex concepts so easily; that's truly awesome.
The sheer compute power of this chip are promising a new era in AI technology. I’m eager to see how this will be utilized in various applications. Kudos to the team behind this innovation!
yeah dont go counting the benefits just yet.. that is a chip that draws in a 100 fucking megawat hour. that thing cant run for more then moments withouth having parts of it being vaporized into heat. with computing almost ALL of the enrgy goes into heat so that is a bloody 99 megawat heater the size of a chesboard you got there.... you litteraly need a powerplant to run this crazy thing.
It seems like an apples to oranges comparison. Put it against GH200 Superpod with 256 Grace Hopper Superchips. That is Nvidias latest offering. It's not only fast, but energy efficient.
@@635574 VHS vs betamax, Bluray vs HD-DVD... the latter was better in both cases and still lost because they couldn't get adoption. Nvidia gave away a lot of very expensive silicon for nothing in some cases or a small pittance to get CUDA in the hands of research teams at universities, who standardized on it, and eventually started teaching it. I love the idea of a competitor but they won't have an easy road if they're not willing to give away a lot of compute, and unlike Nvidia they don't have the gamers and crypto addicts buying every graphics GPU they could get their hands on for double MSRP to bankroll it.
Nvidia fanboy spotted. Power efficiency is a nonissue when you’re talking about the most powerful compute power cuz most people won’t have access to this power until way later.
You don't get the point: as the complexity of the problems you gave to feed on the AI grows these two systems stop scaling together. 90% of the raw costs is on people working on the project, so having a system that does NOT requires more work at all when your workload increases by orders of magnitudes, is a not brainer. You can start training your AI system months before you will on any Nvidia systems. The only thing Nvidia has on its side right now is the shear mass of chips produced each month, so I gues you can build a GH200 ai much faster then you can on Cerebras: not cheaper, but faster, despite being way behind in practicality and raw results.
Another awesome video Ana! Doing direct interviews is a great addition to your repertoire. I have an interest in AI as a social science person. A lot of videos either go way beyond my ability to comprehend, or are filled with superfluous information just to fill time. You consistently put out interesting and coherent information, that I also trust is valid, because of your background.
wonderful video, did some research about cerebras innovative and found out they really have done different and valuable things. "wafer scale engine" is what cerebras been known for, unlike traditional GPU, it is produced on an entire wafer. Conventionally, multiple cpu or gpu are 'printed' by EV on a single wafer, and later processes will cut them off the wafer. Therefore, one reason cerebras is delivering much better performance is because its 'GPU' is bigger. But this also leads to one problem: its even harder to produce than NVDIA GPUs, wafer often comes with defects, individual defected chips from conventional manufacture technique can be discarded. However, cerabras wafer scale engine needs the whole wafer to have no defects. In addtion, heat dissipation, even powering across whole surface are big challenges. Right now, cerebras is cheaper because it's not yet that popular, once market sees advantages from their super computer, their price can go higher than h100 since they are really difficult to make under current tech level.
Thank you for doing these videos and helping the rest of us to see what's going on in the world of AI and computing in general. I appreciate your efforts 😊
I am very interested in Cerebras and Tenstorrent, where they seem to be the most viable alternative to Nvidia, both being companies that makes AI chip that is very scalable. The interesting differentiation between Cerebras and Tenstorrent is that Cerebras started with big chips working their way down (in a sense with enabling PyTorch compatibility) while Tenstorrent works from small chips and evolutionary works their way up. It's interesting to see these different contrasting startup philosophies work in the same industry having basically the same main competitors. Hope to see you cover these two companies in future videos.
Actually the most viable alternative to Nvidia right now is AMD's MI series of processors. The MI300 series is due to be widely available in 2024, and it will probably beat the H100 in terms of performance and flexibility. The research I've done indicates that Cerebras and Tenstorrent are very distant alternatives at this point in time relative to both Nvidia and AMD. There's also Intel with their Gaudi series, where it fits in comparatively is probably along with Cerebras and Tens, the most worrisome aspect being the longevity of the roadmap, Intel has been cutting product lines over the last for years. As we know anything can change quickly since the AI sector is in very early stages, so it's worth looking at all the players including the current batch of underdogs.
@@geekinasuit8333 We agree, if cerebras can lower her prices by 10 can be in the competition if not AMD will be the best alternative. 1 cerebras power computation = 50 nvidia power computation, but for the price of 1 cerebras (2.000.000 $) = you can buy 10 nvidia DGX (200.000 $ , 8 x a100 (10.000 $) ), in price nvidia win. And take in consideration that nvidia is expensive, very expensive. cerebras need to lower her price 4x to be competitive, 10x if want to be competitor.
Never heard of this channel before, until it just popped up on my homepage. And I'm glad it did: great, clear information, with appropriate graphics (when needed), very in depth, but still understandable. One minor piece of constructive feedback: maybe tweak your audio settings a bit to decrease the harsh 's' sounds. I'm using heaphones, and your 's'-es are a bit uncomfortable. Otherwise: great video!
what a time to be alive XD ; luv to see the competition heat up between these top tier tech firms and the smaller startups that are rocking the boat =]
The bane of wafer scale computing has always been that some percentage of the wafer will have defects and be unusable. Does Cerebras has some way around that problem? There was a famous attempt at this back in the 80's and the company couldn't solve the problem and went bankrupt (Trilogy Systems).
The final wafer processor always is quoted after taking into account the dysfunctional parts of that wafer. Meaning its always assumed to lose some parts to imperfections.
it's great to see so many people and companies working on AI hardware, but without a full software stack, it won't be a credible competitor to NVidia. As ML technology advances, they'll have to make sure their compiler handles the workload scheduling efficiently. That's not an easy task.
What if they made their hardware compatible with the Nvidia software? I think in this video it was mentioned that existing tensorflow code for cuda can also work on their hardware.
Fantastic video I just subscribed. Mr. Feldman was speaking my mind when addressing the tokenization of the arabic language. I don't speak arabic sadly but have been trying to find good models to handle it and found that only gpt4 and bloom were decent. I think his company is on to something forging connections to the gulf. Great video thank you!
"Arabic" speakers do not speak the same language any more than Germanic languages speak the same language. The only "Arabic" such speakers share is that of their Prophet, and they consider that Arabic the superior language. So their only motivation in any of this is to spread that ideology. Even if they aren't aware of it, you can bet its the motivation of those in charge of pushing it.
Smaller isn't always better. I theorized such a computer with the whole wafer, the whole complier part was out of my skills, parallel data bus would be the only way. They have achieved the removal of the 2 compiler stages to get to machine language, the single stage compiler with whole wafer design has Nvidia beat, for much cheaper for title of most powerful AI. Dude knows what he has, I will seek to buy one of his systems, for a upcoming product. Thank you great video!
Thank you for your deeper introduction on Cereras!!! I won’t know this despite I stayed around Fremont and Santa Clara last month if I didn’t get into it much deeper…😄
I remember a time where my uncle as an engineer got a PC with 40MB storage and I wondered how he would ever fill that much space. Today I need that space for one single digital raw photo 😅 It’s amazing how fast and capable hardware and software (some not so much 😂) has become
1998, my first computer (well, the "family computer", because they were very expensive, for what they could do): Pentium II at 300mHz, 32megabytes of RAM and I think something like 500megabytes of hard drive, but I'm not sure about this. Of course with floppy disk and cd drives, in that distinctive "greyed white" of the time. I was 13 back then and it feels like another era in the history of humanity 😅😂
We need more channels that focus on the compute chips & infrastructure of AI. All the buzz is around the software but its the hardware that makes it work.
very interesting, I wonder how saleable they are for production, honestly seems like companies will be fighting for these limited quantity high speed chips, surprised ive never heard of them! Great vid
Love that everything you post you have a great explanation and always back up your information with real facts as a document video! Thank you 😊 your awesome! Your brilliance in awesome
There's not a lot of information about Cerebras, so thanks for making this video. I'd like to know how flexible a machine like this is with experimenting with different models? Will you be limited to only a few kinds of models, if so then what exactly are those limitations? One known issue that Cerebras acknowledges as an intentional trade off, is that a machine like this is limited with floating point accuracy and will not be suitable for models that require higher 64bit precision. It appears the machine is optimized for 16bit precision only. I expect there will be other limitations besides the FB accuracy and a summary of what those limitations and what the tradeoffs are (pros and cons) will be nice to know about.
Data quality matters vastly more than parameter count though. Improving LLMs and Stable Diffusion right now is all about figuring out how to get better data.
What an interesting channel and a fabulous, clear presentation on this groundbreaking million dollar AI hardware that will facilitate probably the unimaginable in the near future. Anastasi and her co-host deliver a vivid picture what the company of Nvidia created. It is very exciting ... seeing the future unfold in this enormous leap foward. It tickles me to think that gamers... with their need for the fastest speed ten or fifteen years ago who were willing to pay top dollar to get what they wanted would create a niche market to spawn the likes of this billion plus computer chip made by Ceberas is only the size of an average floor tile, but more powerful than anyting known. This feeling of excitement seems like to me what it must have been to see the Wright brothers flying across the New York City skies for the first time. The significance of this chip is as unknown yet greatly anticipated to become probably the biggest scientific tsunami that will change our civilized world as we know it. Amazing development to learn about and thank you for your excellent presentation.
I guessed the company accurately before listening to video. Tech tech potato youtube channel had some good content on this waffersized chip Your video was very well presented
- Always interesting. - Thx. - I especially like the detail about important role of prep time to set up for training. These nuances can be lost in certain presentations of the data. - As a teacher/consultant, I find that the fundamental problem is an incorrect, and/or incomplete understanding of things. One must study wide, and deep, and question understanding along the way. Many people are not willing, or able to do this, or perhaps just don't think it's worth the time - but in some cases, they do so at their detriment; and others will succeed where they fail. But, to each their own, I guess.
With a core that size aren't the yields extremely low or is it even possible as there is always an error on the whole wafer? Or do the cores have some sort of fault tolerance built in like deactivating the affected sections?
Cerebras: AI supercomputer networked across three of the same type Cerberus: Three headed hound that guards the gate to Hades ... just in case you may have been confused. Don't be.
Fascinating! I wonder how they bridge all those wafers? Also wonder how they transport heat away from them. There is megawatts of heat produced in a much smaller volume than with GPU cabinets.
This is similar to the S1 supercomputer of the 1980s in concept. It was a large wafer but with gates to be connected and could have a new design per week. They used the whole wafer and etched the back side so as to conduct coolan in channels too small to have Bernoulli turbulance.
One thing I've wondered about is why aren't motherboards, or whatever you want to call what the chips are sitting on and the current travels on, printed. Right now, all the circuitry sits on top, but if it was printed, you could print it in 3d so that the circuitry could sit stacked on top of each other. Further more, all the circuitry would be protected from dust, something that slows it down.
The correct way of evaluating designs such as this is not to look at raw TFlops number (use your favorite Tflop) but look at TFlop/mm2 (die strate area) Tflop/$ (Product cost -- essentially mm2 die strata x cost to manufacture ) or Tflop/W ( power to get the Tflop). The massive parallelization of AI workloads essentially makes scale-up (large parallel collection of individual components) relatively simple. On this basis the Cerebras chips are not very compelling at all. Today most of the traditional companies are going the route of some kind of 2 1/2 D integration -- essentially putting multi die in a package. Server chips have crossed reticle limits on a package so Cerbras attempting to make the wafer as a package -- rougly 40 550-580mm2 die -- isn't compelling. Single package server chips are 4 die, 6 die configuration ( lets assume 400mm2/die) means that roughly 10 of the packages would hit the wafer scale structure Cerebras shows. A single 1u can easily accomode 4 such pacakges so what Cerebras shows can be accomodate in 2 server 1u racks *easily*. The density of scale-up for the GPU companies is likely even higher. If you now factor in the need to hook large memory (DRAM) to the individual compute, wafer scale really looks unwieldy. Often the external I/O is done at die perimiter with the areal portion chewed up by power and thermals -- here again wafer scale is at a large dis-advantage simply because of perimeter/area being proportional to 1/d. Might be much cheaper to die singulate and put them on freaking large panels used by the large screen industry if all we want is bragging rights on "see I have X TFlops" -- never understand why engineers waste time with such things.
Data parallel training will not allow you to train larger models, just train whatever models that fits faster. If the limiting factor is the model size, the solution has to be to split the model across different compute units.
Really impressive in the evolution of IA and the usage of AI. In the Middle East they will be thousant stepst in front of others. Nice to find out !!!!
the reality is that in anything related to computer performance, going a monolithic approach is always superior, since the beginning we have wanted to get the modular computation or to imitate biological cells ala Alan Kay approach but going monolith reduces the complexity of all the intercommunication problem.
Super interesting & well presented David v Goliath! Cerebras may be faster, more agile, but Nvda has scale, demand & lots of smarts & likely there will be no one winner at least for now as the industry develops & even mighty Nvda can not supply demand, so smaller, private Cerebras can only supply a few business & in the by & by its advantages may lead to exponential growth. Or something like photonics or quantum compute may render everything obsolete, but the frontier is now with what can AI do, rather than relative compute performance. Who ever gets to market with useful AI makes a fortune, if there is only one they become God like in power, whereas if there are two or more winners we have a balance of power & perhaps lower cost Cerebras will bring on a weaker funded AI business to rival the Nvidia based giants to the benefits of humanity. Thanks for sharing!
I would imagine that it's more difficult to make it fault tolerant for waferscale than they let on, and it's also harder to program software that computes over such a wide area.
I could see Nvidia making a dedicated AI PCIE X16 card. In the future you would be not only upgrading a GPU but an AGI Card too. Thankfully Many boards have multiple PCIE X16 slots.
This is a really interesting video. The part I struggle with is the performance comparison between nvidia and cerebras, seems like comparing apples to oranges. How many nvidia chips are equivalent to 1 cerebras? And then how do you define this equivalence? I suppose those papers you link to will have some details lurking in there somewhere but for now I’ll just rely on what is presented in this video.
Right now NVIDIA is the leader of AI Chip Development design because of their out standing performance and that alone will out perform all the others AI Chip Sets. just like their advance graphics card Chip Set design did all these years. NVIDIA, Intel Chip Set and The Unreal Software Graphics Engine is one partnership that will be great for not just gaming but for all applications.
I saw this yesterday, and tried to see what the largest supercomputers are. I could have sworn I found 1.1 something exaflop; and the combined Cerebras was like 64 exaflops. Do I have that right?
I found a great article yesterday; this one quote stuck in my mind for some strange reason, "For example, a 40 billion-parameter network can be trained in about the same time as a 1 billion-parameter network if you devote 40-fold more hardware resources to it. Importantly, such a scale-up doesn’t require additional lines of code. Demonstrating linear scaling has historically been very troublesome because of the difficulty of dividing up big neural networks so they operate efficiently. “We scale linearly from 1 to 32 [CS-2s] with a keystroke,” he says."
Only 100 million dollars for one Condor Galaxy? that's the same price for a one off Formula 1 car or even a Stealth fighter; i'd call that a pretty good deal.
What is cost per TFLOP? Power per TFLOP? Is it 64 wafers each 50x the power of A100 all taking 1.75 MW? If so, they'll be taking 10 % more power than the 500W NVIDIA A100 (64x50 A100s).
Let me know what you think!
Have you ever thought of creating a community for hardware engineers?
Can you dive into the superconducting elements added to these advanced technologies(ex.Niobium)
@@CircuitSageMatheusI believe this is one, and we are part of it ;)
Please do a vid on the company out of dubai who is creating medical AGI and what kinda of technology they are using and what they plan to do. Medical AGI is very broad. Would love to see what that means.
What is the cost per performance comparison?
Awesome content, nowadays is very difficult to find channels rich in information like yours! Cheers to you for a job well done! 👏
Linus tech tips is all about it, with excellent up-to-date articles on top of the edge technology, and many other excellent channels, BTW. It's just a tip. Perhaps you are not aware of. Good luck. 🎉❤
Truly excellent content
Don't waste your time with this video. The CCP propaganda got a Russian girl to spew all kinds of shit and nonsense!
Your content is always very special and informative. You tend to choose topics that are not commonly found on other channels. The most important thing is the way you explain complex concepts so easily; that's truly awesome.
Don't waste your time with this video. The CCP propaganda got a Russian girl to spew all kinds of shit and nonsense!
The sheer compute power of this chip are promising a new era in AI technology. I’m eager to see how this will be utilized in various applications. Kudos to the team behind this innovation!
Aliens going to start taking our AI computers like theyve been taking the nukes to protect us?
at this rate bitcoin will be susceptible to a 51% attack lol thats so much power
Not really, but there are a lot of investors that are going to making a killing shorting Cerebras.
yeah dont go counting the benefits just yet.. that is a chip that draws in a 100 fucking megawat hour.
that thing cant run for more then moments withouth having parts of it being vaporized into heat.
with computing almost ALL of the enrgy goes into heat so that is a bloody 99 megawat heater the size of a chesboard you got there....
you litteraly need a powerplant to run this crazy thing.
why would they need either? and with the power consumption of these things we do not need nukes to bloody glass the planet lol
It seems like an apples to oranges comparison. Put it against GH200 Superpod with 256 Grace Hopper Superchips. That is Nvidias latest offering. It's not only fast, but energy efficient.
12x the gains. How the hell is nobody talking about it?
@@635574 VHS vs betamax, Bluray vs HD-DVD... the latter was better in both cases and still lost because they couldn't get adoption. Nvidia gave away a lot of very expensive silicon for nothing in some cases or a small pittance to get CUDA in the hands of research teams at universities, who standardized on it, and eventually started teaching it. I love the idea of a competitor but they won't have an easy road if they're not willing to give away a lot of compute, and unlike Nvidia they don't have the gamers and crypto addicts buying every graphics GPU they could get their hands on for double MSRP to bankroll it.
Nvidia fanboy spotted. Power efficiency is a nonissue when you’re talking about the most powerful compute power cuz most people won’t have access to this power until way later.
@@waterflowzzexactly, power efficiency is not the main issue 🤦
You don't get the point: as the complexity of the problems you gave to feed on the AI grows these two systems stop scaling together.
90% of the raw costs is on people working on the project, so having a system that does NOT requires more work at all when your workload increases by orders of magnitudes, is a not brainer.
You can start training your AI system months before you will on any Nvidia systems.
The only thing Nvidia has on its side right now is the shear mass of chips produced each month, so I gues you can build a GH200 ai much faster then you can on Cerebras: not cheaper, but faster, despite being way behind in practicality and raw results.
If you ignore politics and AI conspiracies, it's a great time to be alive! Thank you for sharing these positive breakthroughs.
Another awesome video Ana! Doing direct interviews is a great addition to your repertoire. I have an interest in AI as a social science person. A lot of videos either go way beyond my ability to comprehend, or are filled with superfluous information just to fill time. You consistently put out interesting and coherent information, that I also trust is valid, because of your background.
This is very cool!
Thank you for keeping us up to date with the AI evolution!
wonderful video, did some research about cerebras innovative and found out they really have done different and valuable things.
"wafer scale engine" is what cerebras been known for, unlike traditional GPU, it is produced on an entire wafer. Conventionally, multiple cpu or gpu are 'printed' by EV on a single wafer, and later processes will cut them off the wafer. Therefore, one reason cerebras is delivering much better performance is because its 'GPU' is bigger.
But this also leads to one problem: its even harder to produce than NVDIA GPUs, wafer often comes with defects, individual defected chips from conventional manufacture technique can be discarded. However, cerabras wafer scale engine needs the whole wafer to have no defects. In addtion, heat dissipation, even powering across whole surface are big challenges.
Right now, cerebras is cheaper because it's not yet that popular, once market sees advantages from their super computer, their price can go higher than h100 since they are really difficult to make under current tech level.
Maybe I'm not looking hard enough, but this is the only place I've found good, well summarized info on AI hardware progress. Thanks Anastasi! 😊
Don't waste your time with this video. The CCP propaganda got a Russian girl to spew all kinds of shit and nonsense!
Thank you for doing these videos and helping the rest of us to see what's going on in the world of AI and computing in general.
I appreciate your efforts 😊
OMG...
I had to watch this video because your introductory image is adorable!
I am very interested in Cerebras and Tenstorrent, where they seem to be the most viable alternative to Nvidia, both being companies that makes AI chip that is very scalable.
The interesting differentiation between Cerebras and Tenstorrent is that Cerebras started with big chips working their way down (in a sense with enabling PyTorch compatibility) while Tenstorrent works from small chips and evolutionary works their way up.
It's interesting to see these different contrasting startup philosophies work in the same industry having basically the same main competitors. Hope to see you cover these two companies in future videos.
Actually the most viable alternative to Nvidia right now is AMD's MI series of processors. The MI300 series is due to be widely available in 2024, and it will probably beat the H100 in terms of performance and flexibility. The research I've done indicates that Cerebras and Tenstorrent are very distant alternatives at this point in time relative to both Nvidia and AMD. There's also Intel with their Gaudi series, where it fits in comparatively is probably along with Cerebras and Tens, the most worrisome aspect being the longevity of the roadmap, Intel has been cutting product lines over the last for years. As we know anything can change quickly since the AI sector is in very early stages, so it's worth looking at all the players including the current batch of underdogs.
@@geekinasuit8333 We agree, if cerebras can lower her prices by 10 can be in the competition if not AMD will be the best alternative. 1 cerebras power computation = 50 nvidia power computation, but for the price of 1 cerebras (2.000.000 $) = you can buy 10 nvidia DGX (200.000 $ , 8 x a100 (10.000 $) ), in price nvidia win. And take in consideration that nvidia is expensive, very expensive. cerebras need to lower her price 4x to be competitive, 10x if want to be competitor.
I have no idea what she is talking about, but I keep watching her videos.
I've been selfishly hoping this company would stay a hidden gem 😂😂. Superior compute in terms of training models and on-premise inference. SUPERIOR.
Anastasi is such an amazing person
Agree 😊❤
You mean you have a crush.
Indeed.
Eugenics is good. Breed the superior specimens.
Lol.
Never heard of this channel before, until it just popped up on my homepage. And I'm glad it did: great, clear information, with appropriate graphics (when needed), very in depth, but still understandable.
One minor piece of constructive feedback: maybe tweak your audio settings a bit to decrease the harsh 's' sounds. I'm using heaphones, and your 's'-es are a bit uncomfortable. Otherwise: great video!
Thank you! Noted
I am a simple man, video I see from Anastasi, video I like.
what a time to be alive XD ; luv to see the competition heat up between these top tier tech firms and the smaller startups that are rocking the boat =]
The bane of wafer scale computing has always been that some percentage of the wafer will have defects and be unusable. Does Cerebras has some way around that problem? There was a famous attempt at this back in the 80's and the company couldn't solve the problem and went bankrupt (Trilogy Systems).
The final wafer processor always is quoted after taking into account the dysfunctional parts of that wafer. Meaning its always assumed to lose some parts to imperfections.
There's redundancy built into the chip, so a wafer defect only reduces performance instead of disabling the entire chip.
Wow , nice summary. I was actually wondering how they utilise all those wafer scale engines. Now it is clear. Thank you !
Anastasia In Tech my engineering crush!( Not to be confused with my academic crush, Sabine Hossenfelder)Exceptional content! keep them coming!
it's great to see so many people and companies working on AI hardware, but without a full software stack, it won't be a credible competitor to NVidia. As ML technology advances, they'll have to make sure their compiler handles the workload scheduling efficiently. That's not an easy task.
What if they made their hardware compatible with the Nvidia software? I think in this video it was mentioned that existing tensorflow code for cuda can also work on their hardware.
Fantastic video I just subscribed. Mr. Feldman was speaking my mind when addressing the tokenization of the arabic language. I don't speak arabic sadly but have been trying to find good models to handle it and found that only gpt4 and bloom were decent. I think his company is on to something forging connections to the gulf. Great video thank you!
"Arabic" speakers do not speak the same language any more than Germanic languages speak the same language. The only "Arabic" such speakers share is that of their Prophet, and they consider that Arabic the superior language. So their only motivation in any of this is to spread that ideology. Even if they aren't aware of it, you can bet its the motivation of those in charge of pushing it.
Smaller isn't always better. I theorized such a computer with the whole wafer, the whole complier part was out of my skills, parallel data bus would be the only way. They have achieved the removal of the 2 compiler stages to get to machine language, the single stage compiler with whole wafer design has Nvidia beat, for much cheaper for title of most powerful AI. Dude knows what he has, I will seek to buy one of his systems, for a upcoming product. Thank you great video!
Thank you for your deeper introduction on Cereras!!! I won’t know this despite I stayed around Fremont and Santa Clara last month if I didn’t get into it much deeper…😄
I remember a time where my uncle as an engineer got a PC with 40MB storage and I wondered how he would ever fill that much space. Today I need that space for one single digital raw photo 😅
It’s amazing how fast and capable hardware and software (some not so much 😂) has become
1998, my first computer (well, the "family computer", because they were very expensive, for what they could do): Pentium II at 300mHz, 32megabytes of RAM and I think something like 500megabytes of hard drive, but I'm not sure about this. Of course with floppy disk and cd drives, in that distinctive "greyed white" of the time. I was 13 back then and it feels like another era in the history of humanity 😅😂
I'd love to see a breakdown and compare of this tech against Dojo. Code scaling, Watt's per output unit, data types, and flexibility.
... and cost.
Dojo is a dead duck
We need more channels that focus on the compute chips & infrastructure of AI. All the buzz is around the software but its the hardware that makes it work.
Very interesting and shows a broader view than just the Nvidia or AMD approach. Mind boggling how fast and how far this work is going.
Whoa! This is awesome! Always brilliant content. Love this channel! Learning new words, like Wafer-scale, is eye opening!
very interesting, I wonder how saleable they are for production, honestly seems like companies will be fighting for these limited quantity high speed chips, surprised ive never heard of them! Great vid
nice to see you are back. great show as always
Thanks for making these videos 😀
Love that everything you post you have a great explanation and always back up your information with real facts as a document video! Thank you 😊 your awesome! Your brilliance in awesome
I really liked that cover photo. Keep up the good info.
You are sounding much better now, I can understand what you are saying.
Nicely done, super interesting. I think your best yet
There's not a lot of information about Cerebras, so thanks for making this video. I'd like to know how flexible a machine like this is with experimenting with different models? Will you be limited to only a few kinds of models, if so then what exactly are those limitations? One known issue that Cerebras acknowledges as an intentional trade off, is that a machine like this is limited with floating point accuracy and will not be suitable for models that require higher 64bit precision. It appears the machine is optimized for 16bit precision only. I expect there will be other limitations besides the FB accuracy and a summary of what those limitations and what the tradeoffs are (pros and cons) will be nice to know about.
Thank you for bringing the next one to my watch list, I love your content.
Great video! Thanks!
Data quality matters vastly more than parameter count though. Improving LLMs and Stable Diffusion right now is all about figuring out how to get better data.
You de a wonderful job. Thank you very much for your outstanding content
Very intriguing! Thanks so much for sharing
What an interesting channel and a fabulous, clear presentation on this groundbreaking million dollar AI hardware that will facilitate probably the unimaginable in the near future. Anastasi and her co-host deliver a vivid picture what the company of Nvidia created. It is very exciting ... seeing the future unfold in this enormous leap foward. It tickles me to think that gamers... with their need for the fastest speed ten or fifteen years ago who were willing to pay top dollar to get what they wanted would create a niche market to spawn the likes of this billion plus computer chip made by Ceberas is only the size of an average floor tile, but more powerful than anyting known. This feeling of excitement seems like to me what it must have been to see the Wright brothers flying across the New York City skies for the first time. The significance of this chip is as unknown yet greatly anticipated to become probably the biggest scientific tsunami that will change our civilized world as we know it. Amazing development to learn about and thank you for your excellent presentation.
"Not everyone will get it" at 0:17 I got it. The _bootleneck_ with a boot on the neck of GPU supply. I kid, I love your content.
I am astonished to see that beauty and science can coexist
As an old-timer I appreciated the CEO's commentary when he threw in the term "sneaker net" while describing his AI monster.
Bright, beautiful, charismatic, informative, relevant, entertaining
Thumbnail is 🔥😎🙌🏼
This is exciting indeed. like it so much. You doing great Anastasia. God bless you and your family. this goes for all involved in your vid crew.
Amazing content is delivered with high clarity by the amazing presenter.
I guessed the company accurately before listening to video. Tech tech potato youtube channel had some good content on this waffersized chip
Your video was very well presented
Excellent video.
Wow! That bit at the end about not needing to write more code to expand the parameters/ use more chips.
Very well presented, thank you!
Thanks. I never heard of this company. Amazing.
Excellent information
Good info 👍 I was able to get some shares as ipo .......
I love the art behind him.
- Always interesting.
- Thx.
- I especially like the detail about important role of prep time to set up for training. These nuances can be lost in certain presentations of the data.
- As a teacher/consultant, I find that the fundamental problem is an incorrect, and/or incomplete understanding of things. One must study wide, and deep, and question understanding along the way. Many people are not willing, or able to do this, or perhaps just don't think it's worth the time - but in some cases, they do so at their detriment; and others will succeed where they fail. But, to each their own, I guess.
Thanks for the update.
With a core that size aren't the yields extremely low or is it even possible as there is always an error on the whole wafer? Or do the cores have some sort of fault tolerance built in like deactivating the affected sections?
It's the fault tolerance thing, it's the only way they can make it work on waferscale with all the defects.
Thanks once again for bringing us great content.
Cerebras: AI supercomputer networked across three of the same type
Cerberus: Three headed hound that guards the gate to Hades
... just in case you may have been confused. Don't be.
Fascinating! I wonder how they bridge all those wafers? Also wonder how they transport heat away from them. There is megawatts of heat produced in a much smaller volume than with GPU cabinets.
This is similar to the S1 supercomputer of the 1980s in concept. It was a large wafer but with gates to be connected and could have a new design per week. They used the whole wafer and etched the back side so as to conduct coolan in channels too small to have Bernoulli turbulance.
One thing I've wondered about is why aren't motherboards, or whatever you want to call what the chips are sitting on and the current travels on, printed. Right now, all the circuitry sits on top, but if it was printed, you could print it in 3d so that the circuitry could sit stacked on top of each other. Further more, all the circuitry would be protected from dust, something that slows it down.
What exactly does Feldman mean by "gradients" in the context of what is transmitted between geographically remote clusters?
The correct way of evaluating designs such as this is not to look at raw TFlops number (use your favorite Tflop) but look at TFlop/mm2 (die strate area) Tflop/$ (Product cost -- essentially mm2 die strata x cost to manufacture ) or Tflop/W ( power to get the Tflop). The massive parallelization of AI workloads essentially makes scale-up (large parallel collection of individual components) relatively simple. On this basis the Cerebras chips are not very compelling at all. Today most of the traditional companies are going the route of some kind of 2 1/2 D integration -- essentially putting multi die in a package. Server chips have crossed reticle limits on a package so Cerbras attempting to make the wafer as a package -- rougly 40 550-580mm2 die -- isn't compelling. Single package server chips are 4 die, 6 die configuration ( lets assume 400mm2/die) means that roughly 10 of the packages would hit the wafer scale structure Cerebras shows. A single 1u can easily accomode 4 such pacakges so what Cerebras shows can be accomodate in 2 server 1u racks *easily*. The density of scale-up for the GPU companies is likely even higher. If you now factor in the need to hook large memory (DRAM) to the individual compute, wafer scale really looks unwieldy. Often the external I/O is done at die perimiter with the areal portion chewed up by power and thermals -- here again wafer scale is at a large dis-advantage simply because of perimeter/area being proportional to 1/d. Might be much cheaper to die singulate and put them on freaking large panels used by the large screen industry if all we want is bragging rights on "see I have X TFlops" -- never understand why engineers waste time with such things.
Data parallel training will not allow you to train larger models, just train whatever models that fits faster.
If the limiting factor is the model size, the solution has to be to split the model across different compute units.
very informative thank you
Looks like Good News for for AI. And bigger chips sounds positive. Great video Anastasi
Thank You Anastasi❤️🦾😇🌹👋
Thank you!
Really impressive in the evolution of IA and the usage of AI. In the Middle East they will be thousant stepst in front of others. Nice to find out !!!!
Thank you for the education.
Garfield was being nice to Odie when he was constantly trying to send him to Abu Dahbi.
the reality is that in anything related to computer performance, going a monolithic approach is always superior, since the beginning we have wanted to get the modular computation or to imitate biological cells ala Alan Kay approach but going monolith reduces the complexity of all the intercommunication problem.
Great work! Keep on rocking : )
Super interesting & well presented David v Goliath! Cerebras may be faster, more agile, but Nvda has scale, demand & lots of smarts & likely there will be no one winner at least for now as the industry develops & even mighty Nvda can not supply demand, so smaller, private Cerebras can only supply a few business & in the by & by its advantages may lead to exponential growth. Or something like photonics or quantum compute may render everything obsolete, but the frontier is now with what can AI do, rather than relative compute performance. Who ever gets to market with useful AI makes a fortune, if there is only one they become God like in power, whereas if there are two or more winners we have a balance of power & perhaps lower cost Cerebras will bring on a weaker funded AI business to rival the Nvidia based giants to the benefits of humanity. Thanks for sharing!
A comparative analysis with the Tesla 's Dojo will be great 😊
You've been talking about Cerebras for ages. A really interesting video would be for you to explain why they seem to go nowhere.
I would imagine that it's more difficult to make it fault tolerant for waferscale than they let on, and it's also harder to program software that computes over such a wide area.
Startup - end of story :D :D :D
Excellent video, thank you! :o)
Everything on a single wafer must have some fascinating methods to isolate and bypass faults.
Just one thing. A100 is 312 tflops and H100 is 4000 tflops so about 13x faster not 2x as you say in video. Otherwise great video. Thanks 🙏
Very interesting. Thank you.
Anastasi is lovely and very smart. A real pleasure to watch 😊
I could see Nvidia making a dedicated AI PCIE X16 card. In the future you would be not only upgrading a GPU but an AGI Card too. Thankfully Many boards have multiple PCIE X16 slots.
This was a great video btw, thanks for the information 🙂
This is a really interesting video. The part I struggle with is the performance comparison between nvidia and cerebras, seems like comparing apples to oranges. How many nvidia chips are equivalent to 1 cerebras? And then how do you define this equivalence? I suppose those papers you link to will have some details lurking in there somewhere but for now I’ll just rely on what is presented in this video.
Right now NVIDIA is the leader of AI Chip Development design because of their out standing performance and that alone will out perform all the others AI Chip Sets. just like their advance graphics card Chip Set design did all these years. NVIDIA, Intel Chip Set and The Unreal Software Graphics Engine is one partnership that will be great for not just gaming but for all applications.
Fascinating
Wow that is a HUGE chip!! Amazing 😍
Loved the ASML shot
I saw this yesterday, and tried to see what the largest supercomputers are. I could have sworn I found 1.1 something exaflop; and the combined Cerebras was like 64 exaflops. Do I have that right?
I found a great article yesterday; this one quote stuck in my mind for some strange reason,
"For example, a 40 billion-parameter network can be trained in about the same time as a 1 billion-parameter network if you devote 40-fold more hardware resources to it. Importantly, such a scale-up doesn’t require additional lines of code. Demonstrating linear scaling has historically been very troublesome because of the difficulty of dividing up big neural networks so they operate efficiently. “We scale linearly from 1 to 32 [CS-2s] with a keystroke,” he says."
Only 100 million dollars for one Condor Galaxy? that's the same price for a one off Formula 1 car or even a Stealth fighter; i'd call that a pretty good deal.
You r an amazing presenter!!!
Awesome Job Anastasi now where can I buy stock of Cerebras?
Gotta say Anastasi looks like the best version of the actress Liv Tyler … and a truly legitimate IC expert
What is cost per TFLOP? Power per TFLOP? Is it 64 wafers each 50x the power of A100 all taking 1.75 MW? If so, they'll be taking 10 % more power than the 500W NVIDIA A100 (64x50 A100s).