Cerebras is not! outperforming Moore's law. Moore's law states that the density of transistors will double every few years. In what way does the Cerebras chip outperform other chips according to this metric?
Please forgive me if you have already covered carbon nanotube bundle memory from Nantero, or nanodomain progress from Xyvex. If you have not covered extending Moore's law using Kurzweil's favored paradigm (nanotube arrays / "pistons"), could you? They might be able to store "state" without using energy or magnetism, but physical position read by lasers, for speed and resilience (MRIs + implants = safe). I know this is more theoretical than your recent videos, but Nantero did have a functional product... I apologize in advance if you've covered this already. Your channel is amazing! Thank you for the wonderful updates!
@AnastasilnTech would you consider sharing your investment portfolio? I am very new at doing my own investments and I would be very interested in which startups you like. Doesn't have to be $ amounts and I will not consider your words as investment advice, "disclaimer".
I GPU mine and rent them out to ai or POUW like FLUX for renting. I avoided the 4090s but getting ready to buy the 5090. I hated the heat that vram ddr6x puts out , so i hope the drr7 vram fixed that heat issue. I own 20 evga 3070 drr6 gpus . SO ya lets go 5090 GPUs! Before xmas
Hi Anastasia. Congrats! Another “huge” podcast. You are the only RUclipsr uniquely qualified to bring the latest innovations in AI/processor technologies. I’ve been retired now for two years, after spending 40 years as a semiconductor equipment design engineer in Silicon Valley. So it’s especially fascinating to realize that not only Moore's law still relevant and striving, some technologies carry a promise to far exceed any expectations. Thank you, again for being such an enthusiast of your profession. P.S. it is especially fascinating that the first time I’ve even heard about private equity portfolio was on your program. Even your commercials are educating. 😊
Ma'am, if you are a chip designer, I think a profound and clear tutorial for computer architecture and practical chip design would be a contribution. Thanks for your videos.
@@Ubya_ Apparently, you haven't seen any of the many great courses available on RUclips, many of which far better than those courses by Ivy league universities. Simply put, you're wrong buddy.
@@shahabdolatabadi4116 well that is actually an opinion. People learn based on several factors. Some are audio learners, some kinesthetic and then there's the one's who can just read the information. Some use a combination of the methods. But when it comes down to the 'course', it's honestly gonna be better if they use one or more of the known learning styles to present the information. Sometimes it can come down to having a analogy that is easy for people to relate the information to. Good example on this is computer networking being compared to the Mail/Postal system. I have found some RUclips video's very informative because they meet these standards and it makes it easier to digest the information for retention.
You are creating for yourself a remarkable reputation as the source of accurate knowledgeable and relevant IC technology. And this reputation is a valuable commodity that can’t be easily replaced. CONGRATS. I’ve been around chip tech since the PDP8 computer was built on 7400 series Chips that only had a handful of gates per chip. There’s been many technology approaches & architectures over the years and you seem to have an accurate perspective of how the horse race of chip technology plays out.
As a person who has never been able to understand advance math, which has prevented me from becoming an electrical or mechanical engineer, I have always found this kind of technology incredibly fascinating. Anastasia, you always explain things so well, and you always keep the entire segment fascinating. You make it "easy" for an inept person like myself to understand this material, and you make me "want" to learn more. Thank you for all the wonderful videos make! I wish I would've had you as my math professor. Maybe I would've been able to fulfill my dream and become an engineer. 🙂
I have watched your content from beginning and it have been very informative. I love the way you present science of computing. Keep these comming. Thanks Anastasi🐿
Anastasi is insanely amazing! She is incredibly smart to understand this technology, and simultaneously, beautiful enough to be a super model! She represents the next step in human evolution!
My father taught me analog computing as a teenager, two pots could multiply. He was a rocket scientist and totally understood digital computing however he also continued to say that analog computing was not being used effectively. inertial guidance was partly analog in the beginning. How interesting to see the hybrid approach implemented in cmos.
Jonathan Mills was one of my favorite professors in college. He was doing a lot of amazing work on modern analog computing and hybrids cns.iu.edu/docs/netscitalks/j-mills.pdf. Sadly he has sense passed. Great thinker, good teacher
IBM was the computer to beat in the 1050s, 60s, and 70s. They started out using 4 bit words (Binary Coded Decimal) and moved through Extended B.C.D. (6 bit words) to 8 bit words (ASCII). Some modern equip. uses Unicode (16 bit words) which can represent nearly every written language in use today. The size of the work affects the amount of work the chip can do in a given bit of time and especially how much physical memory is used for each character. IF all you are doing is math, you don't need big words.
I was excited about acceleration since 2021, that Blender changed from Cuda to Optix. Oh boy, same GPU, but 7 times faster renders! Acceleration is the future for sure
Your videos are awesome! They are to the point, up to date and with no distractions. I like your accuracy and enthusiasm, it is inspiring and makes appetite for all the current processor developments.
After watching some videos of Neurolinks first participant, Noland...I wonder if in the end the analog component digital is going to connect with for the most powerful, and most energy efficient output is our brains.
Thank you very much. You simplify the field that I ever thought i cant understand it to be more reasonable conversation. Please keep making this video 🎉
Any chance you could cover Extropic AI and their new thermodynamic processors? I think it seems like a promising area of development but I havent seen it covered by any experts I trust
I once profiled a program that I suspected to be slow because it copied gigabytes of data multiple times. The copying time was only marginal. Instead, it spent most of the time performing a log calculation, then rounding the results. I implemented a low resolution log function that turned out to be faster than the fpu version. That's when I learned that calculating something up to an insane amount of digits is often a huge waste.
1. Jensen Huang: It’s ok, hopper.😂😂 2. TSMC had to make the fabs super clean so they must change those filters on top of the fabs so frequently maybe once within three months. And those activated carbons inside the filters must completely new, they cannot bear the cost if those activated carbons are reused even. 3. Morris Chang once lived in a house maybe close to 20 years ago and it’s just next to my brother house now, by the way.
7:30 to be fair to Moore’s law… It doesn’t really state that transistors “per chip” doubles. It states that transistors “per area” doubles. The Cerebras chips are not breaking that - they are still putting the same number of transistors per square cm as the other N5 TSMC chips are. It’s just that historically “chip sizes“ have roughly been the same, more or less, generation to generation, due to manufacturing constraints, heat dissipation, system design constraints, package design, etc. So assuming the chips are roughly the same size, if transistors shrink then the transistor amount doubles from one generation to the next, then that roughly means transistors per chip has doubled. But in reality, the Moores law charts should all have transistors per square centimeter as the y-axis, not transistors per chip. (And for the pendants, of course Moore’s law has also been modified to indicate “performance per area” instead of just transistors per area). Having said all that, breaking free of the standard maximum die size as constrained by the photo mask and interlinking multiple dies together so they effectively function as one chip is a pretty incredible breakthrough, without using an interposer or other off-die layer like the Intel “Tiles” or AMD “chiplets” use.
DTA: Decade of Technoloical Advancement. It's a good acronym. DTA does seem to be applicable to this era in time. Analog is a good use of the letter "A" too, as I love the concept of analog computing and chips. It would be nice to someday print 100 electronic devices on a single wafer, and use them out of pocket at a whim. The phone is starting to become this, but I'm sure some smarty pants can collect more divergent ideas on a new multi device. I'd love to be able to microwave a pizza from my pocket... haha. Gr8! Peace ☮💜
Some of these concepts are so elusive for non-technical types such as myself at least. Thank you for presenting great information along with superb and clear explanations that include helpful, and I presume accurate, visuals. Will always look forward to delving deeper without fear with your particularly effective guidance. Buona Pasqua.
Kind of makes you wonder why Cerebras doesnt just make round chips, if they are going to take up the entire wafer, why not just use all of it? (they could place notches on the edge to help with orientation). I think the biggest problem for monolithic chip designs is that they can't easily be modularized or repurposed and if the chip fries you lose the whole thing, so even though the chip has redundancy built in, its still not a fully reduntant system.
I suspect it has to do with the circuit design software everyone is using. Laying out the logic for a compute unit in a curved segment at the edge of the wafer is possible, but the software isn't set up for that.
@@HelgeMoulding It has to do with the reticule step and repeat. X/Y grid step and repeat. The edge of the wafer is exposed but die to it being circular you never get a full rectangular coverage.
5:32 «Honestly, 4 bits is quite low and that makes me curious to see how well it's going to work for inference application.» There's a relatively recent paper out there that suggests you can do pretty well using a single trinary bit, or values of -1, 0, +1 (about 1.585 binary bits equivalent), for your weights.
It's so hard to concentrate on the content when it's being delivered by such an exceptional presenter. I do agree, these are exceptional times we are living in.
U have to love science. Everything always sounds so bombastic in highly complex. But then you get down to it and it comes down to "just use 2 chips" "just decrease the size of numbers you calculate with" "just use some redundancy".
whaoooo i'm waiting for this new technology (analog chip with capacitors) with great impatience ! it will blow my mind ! I love all your videos ! I'm a programmer.
I think her explanation is interesting of the double chip in makes me realize that it really isn't some great breakthrough but it was actually the second choice that Jensen had to make because tmsc could not make the next great density improvements with the greater scale. That's interesting to know. It's also interesting to know that their costs are going to go up and their margins may not be as high. One thing I've never understood though is why TM sc isn't the company that makes all the margins and why they let some company like Nvidia which just gives them designs and doesn't make anything why they let in video make the margins.
Commercial analog applications!? I’ve never heard of these EnCharge people, but they seem like the real deal! I’ve always been really fascinated with analog computers, so i’m really excited that they’re making a comeback!
Wow I had no idea someone was using the "interconnects" as a capacitor. Concidering that is already occurring within a chip. That is genius and opens a massive area for alternative computing methods. It could reduce heat/power and need for additional transistors very exciting stuff I need to do some research on this thanks you
Appreciate your expertise. Would love to hear your comparison with Dojo architecture. They too use one of the special packaging processes from TSMC to put 25 die on a single wafer as I understand it. Seems like that would improve yield on final chip integrated wafer higher than single chip all in yield. You get to pick the 25 best yield in smaller chips and assemble them on a larger package with only interconnect which seems similar to Blackwell but at a whole other scale. Would love to hear your thoughts.
Haven't really done any machine learning, but otherwise I'm a very experienced software developer - 4-bit applications can go some distance, but I have to wonder how 4-bit node-weightings can yield real neural-net performance at lo-energy; That said, I would say the chip-architecture could isolate the lo-bit consumption from hi-bit consumption transparently from any given executable layers; Disappointing if they haven't attended to this, but I would guess they have. Enjoying your briefings on modern chip-landscapes thanks Anastasi (Finally subscribed) 🐄
I wrote a workshop paper years ago that shows that lower precision can be compensated for just by including that limit in the training process. Backprop has to be done in floating point to accumulate small weight updates. But all forward calculations have to be done in fixed point. The imprecision causes prediction error, which backprop compensates for naturally.
Since Nvidia was forced to use N4p, its safe to say that another leap on performance is almost guaranteed on next iteration from improvement alone. Regarding the decision to make FP4 the standard, the tradeoff between more parameters/less precision makes sense since there are recent studies showing that quantized models show negligible performance drop when compared with full precision, and that training models with lower precision from scratch may close the gap even further (given the right architecture, of course). About the hybrid chips using capacitors for addition, I wonder if another benefit on the future could be the ability to asynchronously discharge the accumulated charge when a certain threshold is reached. Such architecture could resemble neuron spikes and the way the human brain works asynchronously and with lower frequency.
When i think of it, regular RAM is also capacitor based Why didn't chip manufacturers slap a lot of capacitors in form of metal layers directly on chip? After all transistor cache doesnt scale well compated to logic gates It will require rewriting EDA software from the grounds up But gains will be astonishing in my opinion
Very good overview. Thanks, Anastasia. My comment is that a *solid software stack* is the real key to the adoption of every new hardware architecture out there. That's also the success behind nvidia...
Moore‘s law : Talk about the cost Not the amount. The complexity for minimum component costs has increased at a rate of roughly a factor of two per year. Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years.
Great content but I must admit that the first time that got to your channel I thought that you were created by IA, anyway keep up the good work. saludos
lol, i'm trying to pay attention, but i keep thinking "OMG! she's waving a wafer around again" xD I still remember "the incident" with the shattered wafer ;)
giant chips comprised of compartmentalized chiplets running in parallel is *obviously* better than the current commercial paradigm cudas are jamming on, but I believe that the ideal hierarchy of space and appropriated cache looks more like IBM/Motorola's Cell. Visualizing answers to human problems should be given a structure that provides chip designers with an inherently optimized skeleton to experiment with while maximizing throughput for unique formulas.
I wonder how you're going to power such a massive chip. It'll need a large amount of supply feeds. Fantastic episode - I learn something new every single time, thank you. I do like you getting animated about good ideas, nice.
I'm looking forward to seeing how the world of maths is going to have to catch up. The truth is the programming languages and standards we think of as "Computer Science", they're built around particular styles of compute. Styles that don't well fit with the continuous signal processing nature of analog environments, especially for the more advanced opportunities ahead of us. How the world of mathematics is going to have to make inroads from untapped edge knowledge from the perspective of CS, bringing in things like cyclostationarity, probabilistic logic, and modulation theory into the conversations with people who, so far, really haven't been comfortable thinking outside the boolean box? That'll be quite the journey! Who all emerge as the more current thought leaders will be quite a thing to witness and learn from! Honestly, it reminds me of early IBM days history, the innovative newness to the untapped intersections we have at hand. There's numerous forms of esoteric maths that have stubbornly, persistently developed through generations. How many of such mountains of foundational work are now, finally, ripe for applications? My guess is it'll be quite a few more than many yet think, as the post-boolean floodgates are nearing being flung open. With luck, such innovations will prove put to good use for us all.
The Cerebras strategy is similar to what they were doing with hard drives before, when they had some space reserved to take the information lost by mechanically or magnetically bad clusters.
Any time the silicon die yields are below 100%, you have to discard a number of chips because of flaws. This is why a 16 core CPU chip is marketed with, for example, only 12 cores. NVidia's Blackwell GPU pairs every working chip on the die to another working chip. This increases die yields by a very large amount, compared to a single GPU chip of twice the surface area.
I wasn't thinking about analog chips until watching your channel. Yes, the brain is still the most impressive computer on earth, and I have always felt that biology can store data with incredible density and complexity. This potential can then be extended to the analog chip. It is great to watch another video from the "decade called Acceleration."
Anastasia, there is a paper "Ultra-Low Precision 4-bit Training of Deep Neural Networks, X Sun" and its like the fp4 is way better because neurons dont need to be so precise.
Not gonna begin to pretend I understood everything said, but the 10% that I did understand was absolutely intriguing! Good to know there’s still wiggle room to squeeze more out of Moore’s Law (so to speak).
I was an Intel engineer back in 1990 .We had the first CPU with more than a million transistors .This was a state of the art i860 vector CPU 64 bits ... Now they are talking about 208 billion transistors ,,this is 200k time more dense that the i860 !..Or in other words 200k of those 40 mhz chips in one die
Are they working at similar die sizes to what you were producing back then? It's wild seeing the variation of sizes at the moment, especially with the aggregate setups as you go up the pricing/performance tiers with mainstream cpus.
It would be interesting to know how they implemented 4-bit multiplication. One could do it as a pure table lookup (there are only 256 possible results) but it's hard to tell if it's faster to do the full multiplication calculation in hardware instead. With the huge amount of transistors the new chips have, it might be faster to always compute everything on the fly.
From what I've heard from other content creators, the LIKE carries more weight in the YT algorithm if it is given > 10 min into the video, or in the last 10% of the video (if shorter than 10 min). This was YT's attempt to reduce the effect of video-voting-bot-armies which (could) artificially pump up creators. Apparently, gaming the algorithm is a thing.
4 bit calculations used to be how everything worked - we just cobbled together several compute units to build microcoded processors that had wider registers. On a separate note, calling these things "GPUs" seems to be a misnomer. These are not for graphics, and can't really be repurposed for graphics.
Anastasi have you ever done a video on EDA tools, particularly the automation of analog design? Presumably better analog automation tools would encourage the incorporation of analog elements in new chip design.
Thank you for your video. in the mid 70s I had a chance to be the early user of a hybrid computer (digital / analog computer) and in the 60s I had a chance to use an analog computer. I'm glad to see analog making a comeback. I think there are other interesting applications for analog. For example maybe it can be used in Ensemble modeling. (I recommend The Primacy of Doubt by Tim Palmer).
Go to l.linqto.com/anastasiintech and use my promo code ANASTASI500 during checkout to save $500 on your first investment with Linqto
You said before we will get RISC-V. Will we? I mean Consumer PC.
Cerebras is not! outperforming Moore's law. Moore's law states that the density of transistors will double every few years. In what way does the Cerebras chip outperform other chips according to this metric?
Please forgive me if you have already covered carbon nanotube bundle memory from Nantero, or nanodomain progress from Xyvex. If you have not covered extending Moore's law using Kurzweil's favored paradigm (nanotube arrays / "pistons"), could you? They might be able to store "state" without using energy or magnetism, but physical position read by lasers, for speed and resilience (MRIs + implants = safe). I know this is more theoretical than your recent videos, but Nantero did have a functional product...
I apologize in advance if you've covered this already. Your channel is amazing! Thank you for the wonderful updates!
@AnastasilnTech would you consider sharing your investment portfolio? I am very new at doing my own investments and I would be very interested in which startups you like. Doesn't have to be $ amounts and I will not consider your words as investment advice, "disclaimer".
I GPU mine and rent them out to ai or POUW like FLUX for renting. I avoided the 4090s but getting ready to buy the 5090. I hated the heat that vram ddr6x puts out , so i hope the drr7 vram fixed that heat issue. I own 20 evga 3070 drr6 gpus . SO ya lets go 5090 GPUs! Before xmas
Hi Anastasia. Congrats! Another “huge” podcast. You are the only RUclipsr uniquely qualified to bring the latest innovations in AI/processor technologies. I’ve been retired now for two years, after spending 40 years as a semiconductor equipment design engineer in Silicon Valley. So it’s especially fascinating to realize that not only Moore's law still relevant and striving, some technologies carry a promise to far exceed any expectations. Thank you, again for being such an enthusiast of your profession. P.S. it is especially fascinating that the first time I’ve even heard about private equity portfolio was on your program. Even your commercials are educating. 😊
Yep. She's the cornerstone for all ai and technology matters for me.
Not like the auto-proclaimed "Tech-Unicorn" Annia is the real deal.
I wonder what your personal liability could be with respect to private equity investments.
Morse law? Gordon Moore meets Samuel Morse? 40 years, eh?
She's amazing indeed... but the only one? this is a disrespect for some others!
As an engineer from this industry, I would like to say this channel is very informative and UpToDate.
Ma'am, if you are a chip designer, I think a profound and clear tutorial for computer architecture and practical chip design would be a contribution.
Thanks for your videos.
Okay 🤣
"profund and clear" and "tutorial" don't go really hand in hand. there's a reason why a degree takes years
@@Ubya_
Apparently, you haven't seen any of the many great courses available on RUclips, many of which far better than those courses by Ivy league universities. Simply put, you're wrong buddy.
@@shahabdolatabadi4116 well that is actually an opinion. People learn based on several factors. Some are audio learners, some kinesthetic and then there's the one's who can just read the information. Some use a combination of the methods. But when it comes down to the 'course', it's honestly gonna be better if they use one or more of the known learning styles to present the information. Sometimes it can come down to having a analogy that is easy for people to relate the information to. Good example on this is computer networking being compared to the Mail/Postal system. I have found some RUclips video's very informative because they meet these standards and it makes it easier to digest the information for retention.
Way back when I was getting my BSEE in the 70's I always loved analog computers and am glad to see them now making a comeback!
how many modern ee or cs majors understands what an operational amplifier (op-amp) is or even realizes what the ”op” really meant?
Analogue technology is way too underrated. Incredible knowledge has been lost.
@@monad_tcp those *are* digital, but there's a *lot* of talk/chatter about analog ai chips and analog neuromorphic chips.
You are creating for yourself a remarkable reputation as the source of accurate knowledgeable and relevant IC technology. And this reputation is a valuable commodity that can’t be easily replaced. CONGRATS.
I’ve been around chip tech since the PDP8 computer was built on 7400 series Chips that only had a handful of gates per chip. There’s been many technology approaches & architectures over the years and you seem to have an accurate perspective of how the horse race of chip technology plays out.
As a person who has never been able to understand advance math, which has prevented me from becoming an electrical or mechanical engineer, I have always found this kind of technology incredibly fascinating.
Anastasia, you always explain things so well, and you always keep the entire segment fascinating. You make it "easy" for an inept person like myself to understand this material, and you make me "want" to learn more. Thank you for all the wonderful videos make! I wish I would've had you as my math professor. Maybe I would've been able to fulfill my dream and become an engineer. 🙂
I have watched your content from beginning and it have been very informative. I love the way you present science of computing. Keep these comming. Thanks Anastasi🐿
Anastasi is insanely amazing! She is incredibly smart to understand this technology, and simultaneously, beautiful enough to be a super model! She represents the next step in human evolution!
My father taught me analog computing as a teenager, two pots could multiply. He was a rocket scientist and totally understood digital computing however he also continued to say that analog computing was not being used effectively. inertial guidance was partly analog in the beginning. How interesting to see the hybrid approach implemented in cmos.
Jonathan Mills was one of my favorite professors in college. He was doing a lot of amazing work on modern analog computing and hybrids cns.iu.edu/docs/netscitalks/j-mills.pdf. Sadly he has sense passed. Great thinker, good teacher
IBM was the computer to beat in the 1050s, 60s, and 70s. They started out using 4 bit words (Binary Coded Decimal) and moved through Extended B.C.D. (6 bit words) to 8 bit words (ASCII). Some modern equip. uses Unicode (16 bit words) which can represent nearly every written language in use today. The size of the work affects the amount of work the chip can do in a given bit of time and especially how much physical memory is used for each character. IF all you are doing is math, you don't need big words.
I was excited about acceleration since 2021, that Blender changed from Cuda to Optix. Oh boy, same GPU, but 7 times faster renders!
Acceleration is the future for sure
Great video today. I've heard a little bit about these new chips but your explanations really helped me understand why they are so important.
Your videos are awesome! They are to the point, up to date and with no distractions. I like your accuracy and enthusiasm, it is inspiring and makes appetite for all the current processor developments.
After watching some videos of Neurolinks first participant, Noland...I wonder if in the end the analog component digital is going to connect with for the most powerful, and most energy efficient output is our brains.
Thank you very much. You simplify the field that I ever thought i cant understand it to be more reasonable conversation. Please keep making this video 🎉
Any chance you could cover Extropic AI and their new thermodynamic processors? I think it seems like a promising area of development but I havent seen it covered by any experts I trust
I've read their light paper. It's interesting. I would be curious to learn more if anyone can connect me to them..
It was a bit easier to imagine useful outcomes for fp8 calculations, but 4 bit (1/8th precision!) is just wild. Fun times in computer science 🎉
Have you seen the paper about high performing one bit LLM’s? They actually require two bits, but they’re still half the size of an FP4.
I once profiled a program that I suspected to be slow because it copied gigabytes of data multiple times. The copying time was only marginal. Instead, it spent most of the time performing a log calculation, then rounding the results. I implemented a low resolution log function that turned out to be faster than the fpu version. That's when I learned that calculating something up to an insane amount of digits is often a huge waste.
Awesome breakdown of some of the most important technologies in the world!
That Cerebras chip is just radical, this is why I like your channel, your videos are one step ahead of the rest on cutting edge technology x)
There is a research paper showing scaling these networks down to binary and there was great efficiency, speed and memory gains.
Sometimes less is more
1. Jensen Huang: It’s ok, hopper.😂😂
2. TSMC had to make the fabs super clean so they must change those filters on top of the fabs so frequently maybe once within three months. And those activated carbons inside the filters must completely new, they cannot bear the cost if those activated carbons are reused even.
3. Morris Chang once lived in a house maybe close to 20 years ago and it’s just next to my brother house now, by the way.
Thank You for another video. I am gonna watch it now.
Was looking forward to your video ever since the Blackwell event!! Thank you!
I love your videos, Ana, because I am always learning amazing new things!😊 Happy Easter!😉
You have the best channel on You Tube! I really enjoyed the latest one: "This is Huge". Incredible info! Thanks!
Love how balance are always your videos! You almost never succumb to the hype! :)
7:30 to be fair to Moore’s law… It doesn’t really state that transistors “per chip” doubles. It states that transistors “per area” doubles. The Cerebras chips are not breaking that - they are still putting the same number of transistors per square cm as the other N5 TSMC chips are. It’s just that historically “chip sizes“ have roughly been the same, more or less, generation to generation, due to manufacturing constraints, heat dissipation, system design constraints, package design, etc. So assuming the chips are roughly the same size, if transistors shrink then the transistor amount doubles from one generation to the next, then that roughly means transistors per chip has doubled. But in reality, the Moores law charts should all have transistors per square centimeter as the y-axis, not transistors per chip. (And for the pendants, of course Moore’s law has also been modified to indicate “performance per area” instead of just transistors per area).
Having said all that, breaking free of the standard maximum die size as constrained by the photo mask and interlinking multiple dies together so they effectively function as one chip is a pretty incredible breakthrough, without using an interposer or other off-die layer like the Intel “Tiles” or AMD “chiplets” use.
Agree, this is misleading saying breaking Moore’s Law. Not true. Wafer scale has been done before. Thing new.
Anastasi would make a great Chief Engineer on a starship. Now we just need to create the starship! Not the SpaceX one, more like the Enterprise.
She could be 7 of 9.
DTA: Decade of Technoloical Advancement. It's a good acronym. DTA does seem to be applicable to this era in time. Analog is a good use of the letter "A" too, as I love the concept of analog computing and chips. It would be nice to someday print 100 electronic devices on a single wafer, and use them out of pocket at a whim. The phone is starting to become this, but I'm sure some smarty pants can collect more divergent ideas on a new multi device. I'd love to be able to microwave a pizza from my pocket... haha. Gr8! Peace ☮💜
Some of these concepts are so elusive for non-technical types such as myself at least. Thank you for presenting great information along with superb and clear explanations that include helpful, and I presume accurate, visuals. Will always look forward to delving deeper without fear with your particularly effective guidance. Buona Pasqua.
Another excellent video Anastasi. Technology is going crazy. 🙂
Kind of makes you wonder why Cerebras doesnt just make round chips, if they are going to take up the entire wafer, why not just use all of it? (they could place notches on the edge to help with orientation). I think the biggest problem for monolithic chip designs is that they can't easily be modularized or repurposed and if the chip fries you lose the whole thing, so even though the chip has redundancy built in, its still not a fully reduntant system.
I assume a partial chip section is not a completed circuit.
Step and repeat is how IC reticules are exposed.
@@supermodal the cores are tiny so they could go right up to the edge of the wafer
I suspect it has to do with the circuit design software everyone is using. Laying out the logic for a compute unit in a curved segment at the edge of the wafer is possible, but the software isn't set up for that.
@@HelgeMoulding It has to do with the reticule step and repeat. X/Y grid step and repeat. The edge of the wafer is exposed but die to it being circular you never get a full rectangular coverage.
Love your deep dives into the latest technology, Anastasi! Thank you. :)
5:32
«Honestly, 4 bits is quite low and that makes me curious to see how well it's going to work for inference application.»
There's a relatively recent paper out there that suggests you can do pretty well using a single trinary bit, or values of -1, 0, +1 (about 1.585 binary bits equivalent), for your weights.
It's so hard to concentrate on the content when it's being delivered by such an exceptional presenter. I do agree, these are exceptional times we are living in.
Lol it takes me like 3-4 times the watch time to actually finish the video because i have to pause and think about everything she brings up
U have to love science. Everything always sounds so bombastic in highly complex. But then you get down to it and it comes down to "just use 2 chips" "just decrease the size of numbers you calculate with" "just use some redundancy".
whaoooo i'm waiting for this new technology (analog chip with capacitors) with great impatience ! it will blow my mind ! I love all your videos ! I'm a programmer.
Love the hardwork and dedication that you put in making these videos for us Anastasi👏👏👍
It’s fun to see you so excited about analog. I think we need an app that translates your body language into an investment indicator!
I think her explanation is interesting of the double chip in makes me realize that it really isn't some great breakthrough but it was actually the second choice that Jensen had to make because tmsc could not make the next great density improvements with the greater scale.
That's interesting to know. It's also interesting to know that their costs are going to go up and their margins may not be as high. One thing I've never understood though is why TM sc isn't the company that makes all the margins and why they let some company like Nvidia which just gives them designs and doesn't make anything why they let in video make the margins.
Your insights are deep and your explanations are outstanding.
Commercial analog applications!? I’ve never heard of these EnCharge people, but they seem like the real deal! I’ve always been really fascinated with analog computers, so i’m really excited that they’re making a comeback!
from 8bit to 4bit ... we are looking at the next Turing Award candidate here :D
So they have found that you get better and more efficient intelligence with less accuracy. It all makes sense now. That explains everything.
No need to stop there. Just read a paper that claims you can drop to 1.58 bits (just 0 and +/- 1) and get basically the same quality.
@@almightysapling BS
@@Sven_Dongle It's true.
Interesting talk, thanks Anastasi. Dan :)
Wow I had no idea someone was using the "interconnects" as a capacitor. Concidering that is already occurring within a chip. That is genius and opens a massive area for alternative computing methods. It could reduce heat/power and need for additional transistors very exciting stuff I need to do some research on this thanks you
Appreciate your expertise. Would love to hear your comparison with Dojo architecture. They too use one of the special packaging processes from TSMC to put 25 die on a single wafer as I understand it. Seems like that would improve yield on final chip integrated wafer higher than single chip all in yield. You get to pick the 25 best yield in smaller chips and assemble them on a larger package with only interconnect which seems similar to Blackwell but at a whole other scale. Would love to hear your thoughts.
I am grateful for your podcast. Thank you
Love the content, I've sent it to my friends
Haven't really done any machine learning, but otherwise I'm a very experienced software developer - 4-bit applications can go some distance, but I have to wonder how 4-bit node-weightings can yield real neural-net performance at lo-energy; That said, I would say the chip-architecture could isolate the lo-bit consumption from hi-bit consumption transparently from any given executable layers; Disappointing if they haven't attended to this, but I would guess they have. Enjoying your briefings on modern chip-landscapes thanks Anastasi (Finally subscribed) 🐄
I wrote a workshop paper years ago that shows that lower precision can be compensated for just by including that limit in the training process. Backprop has to be done in floating point to accumulate small weight updates. But all forward calculations have to be done in fixed point. The imprecision causes prediction error, which backprop compensates for naturally.
Thanks for the updates .. Always thought analog computing was undervalued ..and that giant chip thats just incredible
How does Cerebras deal with the cooling problem over such area?
or pay for it?
Since Nvidia was forced to use N4p, its safe to say that another leap on performance is almost guaranteed on next iteration from improvement alone.
Regarding the decision to make FP4 the standard, the tradeoff between more parameters/less precision makes sense since there are recent studies showing that quantized models show negligible performance drop when compared with full precision, and that training models with lower precision from scratch may close the gap even further (given the right architecture, of course).
About the hybrid chips using capacitors for addition, I wonder if another benefit on the future could be the ability to asynchronously discharge the accumulated charge when a certain threshold is reached. Such architecture could resemble neuron spikes and the way the human brain works asynchronously and with lower frequency.
While there is statistical variation in the exact release date of individual advances, Moore's law marches on.
When i think of it, regular RAM is also capacitor based
Why didn't chip manufacturers slap a lot of capacitors in form of metal layers directly on chip?
After all transistor cache doesnt scale well compated to logic gates
It will require rewriting EDA software from the grounds up
But gains will be astonishing in my opinion
Anastasi...you smashed this future new chip design review.
We need bigger GPUs ! 😎
the man can really sell it...right
Idea for the next video: make a review of your stock picks related to semiconductors and related industry.
Very good overview. Thanks, Anastasia.
My comment is that a *solid software stack* is the real key to the adoption of every new hardware architecture out there. That's also the success behind nvidia...
Awesome video! Great info here :) I especially appreciated the lead on private equity investing. Thanks
Moore‘s law : Talk about the cost Not the amount. The complexity for minimum component costs has increased at a rate of roughly a factor of two per year. Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years.
People keep redefining Moore's Law so they can say it's still relevant. It was never a measure of performance.
Great content but I must admit that the first time that got to your channel I thought that you were created by IA, anyway keep up the good work. saludos
lol, i'm trying to pay attention, but i keep thinking "OMG! she's waving a wafer around again" xD I still remember "the incident" with the shattered wafer ;)
Never heard about analog chip. Very cool indeed!
I like how well you explain things very good. 👍 👍
Who else ran to the channel at sublight speed
I would be impressed by anyone running at greater than sublight speeds.
Beans
Everyone
Duh I have mass
@@AustinThomasPhD I identify as a tachion you bigot!
Good video! You obviously know what you are talking about.
giant chips comprised of compartmentalized chiplets running in parallel is *obviously* better than the current commercial paradigm cudas are jamming on, but I believe that the ideal hierarchy of space and appropriated cache looks more like IBM/Motorola's Cell. Visualizing answers to human problems should be given a structure that provides chip designers with an inherently optimized skeleton to experiment with while maximizing throughput for unique formulas.
I wonder how you're going to power such a massive chip. It'll need a large amount of supply feeds. Fantastic episode - I learn something new every single time, thank you. I do like you getting animated about good ideas, nice.
I'm looking forward to seeing how the world of maths is going to have to catch up.
The truth is the programming languages and standards we think of as "Computer Science", they're built around particular styles of compute. Styles that don't well fit with the continuous signal processing nature of analog environments, especially for the more advanced opportunities ahead of us.
How the world of mathematics is going to have to make inroads from untapped edge knowledge from the perspective of CS, bringing in things like cyclostationarity, probabilistic logic, and modulation theory into the conversations with people who, so far, really haven't been comfortable thinking outside the boolean box? That'll be quite the journey!
Who all emerge as the more current thought leaders will be quite a thing to witness and learn from!
Honestly, it reminds me of early IBM days history, the innovative newness to the untapped intersections we have at hand. There's numerous forms of esoteric maths that have stubbornly, persistently developed through generations. How many of such mountains of foundational work are now, finally, ripe for applications? My guess is it'll be quite a few more than many yet think, as the post-boolean floodgates are nearing being flung open.
With luck, such innovations will prove put to good use for us all.
The Cerebras strategy is similar to what they were doing with hard drives before, when they had some space reserved to take the information lost by mechanically or magnetically bad clusters.
Awesome video Anastasi!
Wasn't Apple's Ultra chip a dual well before this "first time" chip?
You are brilliant Anastasi ❤ thank you so much for sharing
thank you
Any time the silicon die yields are below 100%, you have to discard a number of chips because of flaws. This is why a 16 core CPU chip is marketed with, for example, only 12 cores.
NVidia's Blackwell GPU pairs every working chip on the die to another working chip. This increases die yields by a very large amount, compared to a single GPU chip of twice the surface area.
Thanks and wish you more success ❤
thank you for this great video !
I wasn't thinking about analog chips until watching your channel. Yes, the brain is still the most impressive computer on earth, and I have always felt that biology can store data with incredible density and complexity. This potential can then be extended to the analog chip. It is great to watch another video from the "decade called Acceleration."
Anastasia, there is a paper "Ultra-Low Precision 4-bit Training of Deep Neural Networks, X Sun" and its like the fp4 is way better because neurons dont need to be so precise.
Another great video!!!! Thanks!!!
Not gonna begin to pretend I understood everything said, but the 10% that I did understand was absolutely intriguing!
Good to know there’s still wiggle room to squeeze more out of Moore’s Law (so to speak).
Amazing!!!!Congrats
I was an Intel engineer back in 1990 .We had the first CPU with more than a million transistors .This was a state of the art i860 vector CPU 64 bits ... Now they are talking about 208 billion transistors ,,this is 200k time more dense that the i860 !..Or in other words 200k of those 40 mhz chips in one die
Are they working at similar die sizes to what you were producing back then? It's wild seeing the variation of sizes at the moment, especially with the aggregate setups as you go up the pricing/performance tiers with mainstream cpus.
Anastasia, I Can not overstate the distinction between memory and mammary. The difference is overwhelming.
The video is awesome !
It would be interesting to know how they implemented 4-bit multiplication. One could do it as a pure table lookup (there are only 256 possible results) but it's hard to tell if it's faster to do the full multiplication calculation in hardware instead. With the huge amount of transistors the new chips have, it might be faster to always compute everything on the fly.
Anastasi, what do you think about Qualcomm's future in this space?
Thanks. Be interested in a power consumption update. EVs and hyperscale data centers proliferate faster than grid improvements. All the best
It is very interesting! Thanks!
Great video. Where does Dojo fit in this scheme.? Does it have any advantages?
I give a LIKE after 5 sec. although the clip itself takes 21 minutes.
I'd be disappointed if you didn't.
That's impressive.I am so proud.
*tips his fedora
From what I've heard from other content creators, the LIKE carries more weight in the YT algorithm if it is given > 10 min into the video, or in the last 10% of the video (if shorter than 10 min). This was YT's attempt to reduce the effect of video-voting-bot-armies which (could) artificially pump up creators. Apparently, gaming the algorithm is a thing.
Thanks
Thank you
4 bit calculations used to be how everything worked - we just cobbled together several compute units to build microcoded processors that had wider registers. On a separate note, calling these things "GPUs" seems to be a misnomer. These are not for graphics, and can't really be repurposed for graphics.
Hahahaha is 😂😂😂 first time seeing you telling jokes out of the blue. Cheers 🎉 have great Easter
Thanks for the video.
Should they really be called GPUs now? They aren't processing graphics, but code for LLM, etc. Wouldn't MLPU or AIPU be more appropriate?
They are already been called NPU "Neural Processing Unit" by Intel, Qualcomm, AMD, etc
When Anastasi goes to the car dealer to negotiate a purchase, they end up paying her.
Anastasi have you ever done a video on EDA tools, particularly the automation of analog design? Presumably better analog automation tools would encourage the incorporation of analog elements in new chip design.
Thank you for your video. in the mid 70s I had a chance to be the early user of a hybrid computer (digital / analog computer) and in the 60s I had a chance to use an analog computer. I'm glad to see analog making a comeback. I think there are other interesting applications for analog. For example maybe it can be used in Ensemble modeling. (I recommend The Primacy of Doubt by Tim Palmer).