@Anastasi In Tech - What about using i-squared-l logic and/or vacuum channel FETs, possibly on chiplets? I2L seemed very promising when first introduced, but it's power consumption was high since transistors were all large at that time. As a bipolar technology it will not suffer from gate leakage problems. Are there any other reasons why it might not work? As for "vacuum" channel FETs, they are 10 times faster or more, partly because they use free electrons. They also benefit from nanoscale features, are extremely radiation resistant, and they can operate comfortably at temperatures up to hundreds of degrees Celsius. Also they don't actually require vacuum when built at small nanoscales.
@@fluiditynz - I left my comments and questioon here because it is the most likely place for her to see it. Nothing to do with the laptop she's promoting.
When I first started programming, and RAM was off chip and typically a few KB, we'd spend a lot of dev time working out how to do as much as possible in as little RAM as possible and as few clock cycles as possible. These days the demands to cut development time and get new features out, more driven by senior management and Product Owners than by real customer demand, seems to have ditched those ideas. If it's too slow the customer is expected to just buy a higher spec machine and new developers are taught ways to shorten development time but not execution time. I think that this is a false economy. About 10 years ago I was able to shorten a big data-processing job from 3 days to under 20 minutes, on the same hardware, by applying the techniques I'd learned back in the 1980s to key functions. It took me 5 days, but when this is something that has to be run every week the saving soon stacks up
You are absolutely right. Once I participated in a service job to get a power station running. The problem was to bring the gas engines up and running as fast as possible. After a few days the programmer had been flown in and looked for alternative assembler commands to save a clock cycle here and a clock cycle there.😁
Wirth's Corollary to Moore's Law: Any improvement in Hardware performance will be negated by code bloat at an equivalent rate. Kinda like traffic in London.
It's not a false economy, just a different emphasize due to the change in price structure. In the old days, memory were expensive, so we tried to economize its use. Today's memory are so cheap, that software developing time has become the most expensive part of a system.
@@gorilladisco9108 the cost of memory is largely immaterial. It’s the cost of execution time. Say you’ve got a transaction that currently takes 10 minutes to complete but if the code was optimised would take 7 minutes. To optimise the code would take the developer an extra 5 days effort and the developer earns £30 an hour (that’s the mid-point for a developer where I work), so that’s about £1100 wage cost but once it’s done that cost is done. Once rolled out the application is used by 200 people paid £16 an hour (I have some specific applications we use in mind here). Saving 3 minutes per transaction means either those same staff can process 30% more transactions or we can lose 60 staff at a saving of just over £7000 a day. That extra development time would repay in a little over an hour on the first day and after that would be pure cost saving.
You really have a knack for making complex topics engaging and easy to follow for everyone! Breaking down the challenges of SRAM and introducing phase change memory in such a clear manner is no small feat. Excited for more content like this!
The problem with chiplet design is heat management. Since every layer is active, it burns energy and produces heat, and this isn't good. A secondary problem is the bus interconnect because stacking requires shared lanes, so memory layers are in parallel, making the bus interconnect a bottleneck. Last but not least is signal strength and propagation time: stacking layers requires precise alignment and add electron jumping around, so there's a potential limiting factor in electron propagation, noise and eventual errors. This isn't much of a problem if the system is built around it, but it still is a limiting factor. There are solutions: since there's one master and multiple slaves there's no risk of collisions and so you can make a lot of assumptions on the drawing board... but busses are going to become wider and more complex, and that will add latency where you don't want it. My 2 cents.
- I wonder if they run veins of metal in between the layers to send the heat to radiator. - They put L3 cache on the second layer, which by virtue is quite removed from the logic circuits.
@@nicholasfigueiredo3171 Given the way she talks, I would had guessed her field was steering investors into various markets. The technical run down is useful, but the whole discussion is still clearly framed like "guess whats going to make a lot of money in the near future?".
She is a technology communicator. Learn what science is, please. I'm guessing you are a Republican, so I wouldn't expect you to understand the difference.
The point "good endurance 2*10^8 cycles" prohibits its use for cache memory. But it's really a viable and competitive option as a replacement for Flash memory!
Thanks. Amazing video. It's kind of interesting how it always comes down to the same principles. First shrinking the size in 2D, then layering stuff, and eventually going into the 3rd dimension. And when that reaches its limits, then change the packaging and invent some hybrid setup. Next, change the materials and go nano or use light etc. instead. Even the success criteria are usually similar: energy consumption, speed or latency, size and area, cost of production, reliability and defect rate, and the integration with the existing ecosystem.
I'd be curious about the thermodynamic side effects of phase change memory during transitions as the crystallisation would release heat while amorphization would be cooling
Nice idea. Very similar to Nantero NRAM that also uses Van der Walls effect to provide resistive cells using carbon nanotubes for SSD/DRAM universal memory. I've been waiting for NRAM for 20 years, and it is only now beginning to make it's way into the data centre. Let's hope that this technology takes less time to mature.
I worked on micron/intels PCM, optane, for a few years. While we were making peogress on some of the problems you mentioned, the venture ultimately failed due to the economics of producing the chips as well as a lack of customers. Would be cool to see it make a comeback in the future
I thank you for your service. When intel announced that they were ending optane, I bought 6 of those PCIE drives; I caught a fire sale. Those drives are the fastest drives I have for doing some disk intensive Studio work. I wish they could've gotten the price down around $100-$200 dollars for the good stuff. I actually got 6 optanes for $45 a piece. I lucked up and bought a box.
My fave memory joke: Stand in the nutritional supplement section of a store and look at the products with a confused expression. When someone else is nearby, ask "Do you remember the name of that one that's supposed to help memory?"
Thank you for your presentation. I found it fascinating. The phase change memory, amorphous crystal back to uniform array crystal seems like the mental models used to explain demagnetization around the currie point.
Very interesting. Thanks for sharing your expertise. There is always something interesting in your videos. At least in the three or four i have seen so far.😊
It has been discussed for decades that close stacking of chips has advantages of speed and size. The issue is heat generation, thus trying to reduce the total charge (electron count per bit). New memory technology is required with far smaller charge transfered per operation.
I used to use magnetic memory... when I worked on DEC PDP-8e. It was called core memory, you could actually see the core magnets and wires that were wrapped around the cores.
The words "dynamic" and "static" are a reference to the powering method between state changes. You kind of hinted at this with the TTL logic diagram, but didn't expand. Static is faster because it doesn't have to wait for the re-fresh cycles before it can change state. Static also runs hotter and consumes more power- there are no free lunches ;-)
Not exactly. DRAM consumes power all the time, because it needs constant refresh to preserve contents. SRAM only consumes power during state change. Both consume some leakage current though, and with that, SRAM consumes more due to having more transistors per bit cell. DRAM also consumes considerable current to change state, because of its larger gate capacitance. Overall, DRAM tends to consume more power per bit but costs less and is more compact, which is why we use it for main memory and reserve SRAM for cache and internal registers.
PCM memory chip technology has been in R&D since the mid 2000s. Intel, StMicroelectronics and Ovonyx were in the game together in a joint development starting around 2005. Samsung was also doing research in PCM. I believe the biggest player now in Micron Technology.. And you are correct about all the advantages of PCM. I believe the two big challenges are being able program the device.into two or more distinct, well defined resistance states reliably coupled with manufacturing very small structures with precise dimensions. Nvidea is talking about PCM.
This sort of tech is very interesting, because depending on how it advances, it stands to change the computing landscape in one or more different ways. If Phase-Change Memory is fast enough and gets good enough density, it can replace SRAM in L3 cache. If the speed cannot get high enough, it could still find use as an L4 cache or a replacement for DRAM. If all else fails, I bet it could give Flash storage a run for its money.
Great stuff. As someone who built their own desktops through computer conventions in the 90s I appreciate you bringing me up to date on where we stand now in personal computer development😊
Stacking silicon...who woulda thought ...now it makes perfect sense for chip real estate. Thank you for your brilliant assessment of the latest chip technology. You have expanded my knowledge regularly.
It's quite bizarre that you thought the PCM memory is a future replacement of SRAM, as the it has a switching speed of 40ns (on par with DRAM), according to the paper you cited. This is an order of magnitude slower than SRAM. The current only viable option to replace SRAM is SOT-MRAM, which TSMC is working on. Go research SOT-MRAM😁
It also involves a physical change to the medium, which means wear and limited number of writes. I believe a similar principle has been around since at least the 90s. I used to have a CD-R/W type device that used a laser to heat up spots of a special metallic medium, changing it from smooth to amorphous. Could be rewritten some number of times. I will say though, your point is probably good and valid, but could have been made more constructively.
This is true. PCM is totally useless as SRAM replacement and doesn’t have sufficient speed or rewrite resilience. Honestly, she really failed to understand its use case. It’s a great alternative to floating-gate FLASH memory, not SRAM!
Paul Schnitzlein taught me how to design static RAM cells. This video speaks to me. Yes the set/clear, and sense amps are all in balance. It is an analogish type circuit that can burn a lot of power when being read.
Thank you for this video. It's great. My two issues: (1) heat dissipation, is not addressed (over cycles there is growth of H.A.Z.), (2) One thing I heard about and remember vaguely, was an attempt at self healing logics (rather, materials + control circuitry), which is aimed at reducing the need for redundancy, in elements at the core of the chip (hottest and fastest environment), and attempts to also better the chip lifetime (cycles 'til dead). -I would be grateful if you could address both.
Loved the graph you put together with the memory pyramid (access time vs where is used, with volatility information)!! P.S. Your accent also becomes more and more easy to understand!
Thank you! Just wanted to say you have a mistake in the figure you show (e.g. 12:38) labelling the latency of flash memory as milliseconds (1/1000s) when, as you say in the audio, the latency is in microseconds (1/1000000s)
One of the chief benefits I can see in going to optical computing is the ability to have associative addressing through polarization and muliple concurrent optical reading/writing heads for raid like processing.
As always fantastic work. I am not so enthusiastic right now with the new technology an endurance of 2E8 is amazing for something like storage, but the computer will go over that in no time for the cache. Even a microprocessor that is not super scalar and runs on the ghz range will be accessing memory in the other of 10^9 per second. Clearly, that access is per cell, and not for the full memory but they need to improve that number a lot.
I remember hearing about the SRAM scalling issue some time before the Zen4 release, but then haven't heard anything even though I kept hearing about shinking nodes. Been curious what was coming of that. I was thinking that since it's not benefiting from the scaling, if it may have been counterproductive regarding degradation etc. I wonder if that is what is happening with the Intel 13 and 14K skus? I guess we will find out soon enough. Thanks for the update, I'm glad they are on top of it!
I believe that down the line we would need to use another processor architecture than the Von Neumann one that we use today (i.e. having logic and memory separated), an architecture that instead has an "on memory compute" design, or perhaps a mix of them. In the end the speed of light makes it hard to compute over longer distances (i.e. CM or even MM) specially when the frequency goes up and the data becomes even larger.
Thank u for pointing this out!👍Not just on chip SRAM memory, but operating memory in general has a lot of catchup to do with the compute logic not only because of the limitation of further shrinking SRAM and demand from the AI workloads. Operating memory has been historically left behind the compute logic and in a way ignored "nature's" way of things (brain neurons) by being the same and as fast as compute/processing while having sufficient capacity. Maybe PCM or other memory technologies will deliver that in the future, however I agree with u that L1 cache will most definitely continue to use SRAM for the foreseeable future and L2/3/4 with larger capacities will most likely go first with the stacked SRAM before moving to new technology like PCM or resistive memory.
Although I do not comprehend all the things you mentioned, what I do understand I find very fascinating. Yours and videos of others help me to decide on what companies and technologies in which to invest (= gambling) at the Wall Street Casino. Investing in stock is like playing Black Jack. The more you know such as via "card counting", the better your chances of winning. For me, your advice is akin to card counting when it comes to gambling on stock purchases. Thanks for your insight in this realm. BTW, my 1st computer was an Atari 800XL which I purchased in 1985. I also wrote code in Atari Basic and in HiSoft Basic. Ten years later, I used the program I wrote to analyze the data for my Master's degree in Human Nutrition. With the Windows computers, writing code now has become too complicated for me, so I have given up on that endeavor.
The BCM2837 SoC chip uses stacked RAM. The Raspberry Pi Foundation released the Pi Zero 2W in 2022 using it. So who stacked first? Regardless of who, it’s great to hear the designers are finding solutions to such huge (microscopic) problems!
I had thought of building memory (and the whole IC) in 3D 10 years ago. I think I even put the idea in my website years ago. One part of my idea that is not used yet is using microfluidics to cool the chips that are stacking transistors in 3D, thus restricting heat transfer. The channels could run many levels, and of course, they need fluid-tight connections (a big problem). And use optics to communicate instead of a BUS. Possibly LED or laser tech.
I worry about using non-volatile memory for primary or cache memory because of the security aspect. If the information remains after power is interrupted, quite a few "secrets" will be in clear text, and the determined and well equipped "bad actor" will be able to extract surprising amounts of information from a system. My industry has to issue letters of volatility with everything we produce, and for anything with NVM, the sanitization procedure usually involves removing the part with non-volatile storage and destroying it. The only exception is when it can be proven that the hardware is incapable of writing to that NVM from any component present on the assembly, even if malicious or maintenance software is loaded onto the device. This phase change memory built in the same package as the CPU logic could not be provably zeroized without some sort of non-bypassible hold up power, and that would increase the cost and size of the chip package. I think this is very promising for secondary addressable storage, but I don't see it replacing main memory in most applications.
One way of attacking the Memory Wall hierarchy is to attack it from the top, use RLDRAM which has been around for >25 years but only in NPUs (network PUs) since it offers DRAM cycle rates closer to 1ns but latency of 10ns or 8 clocks. Since it is highly banked, 16-64 banks working concurrently allows for 8 memory accesses every 8 clocks so throughput is 2 orders better than conventional DRAM. Of course in single thread use, not much benefit and to keep as many threads in flight requires that thread selects pseudo randomly across the banks and not hit on the same bank successivly.This could be used as an extra layer between normal DRAM on slow DIMM packages and the first SRAM cache level. This RLDRAM layer is where it would be used in CAM modules or soldered. We are substituting The Memory Wall for a Thread Wall here. But we already are used to having dozen threads these days. The RLDRAM model could be applied one level lower down in an RLSRAM version which would be perhaps several times faster but allow bank cycles and latency near 1-2ns but still 8 clocks and 16 banks.
Many years ago I wondered why transistors and memory were not stacked in 3D in layers. I figured it was because of heat. My solution to that was microfluidics and possibly sodium to carry it away. I also thought light pipes (lasers) could replace the metal bus.. A lot of work to make it to production as the hardware needs to accommodate new kinds of connections.
Non-volatile and low-latency at the same time, coupled with scalability and hopefully cost-effectiveness in manufacturing, would be a huge technological leap. Thank you for the information.
Great video - thank you Anastasi :-) I think if we stack much more memory as 3rd level cache chiplets on top of CPUs we may reach the size of gigabyte 3rd level cache. And this would eliminate the external DIMMs on the mainboard which makes future Notebooks and PC again cheaper and reduces not just the complexity of the mainboard but also of the operating system, drivers and firmware because data can be loaded directly via fast PCIe lanes connected SSDs to 3rd level cache.
the NVMe slotted in the DDR5 slot - direct access storage - skipping a part of memory all over.. the system boots from storage/memory - slot 2,3,4 are real RAM just a dime throw
I love the way you explain the topic it gets me thinking even though I have no idea. Like possibly folding the memory and interconnecting them to form cubes cause I always see dies represented in 2d. Like I said, not my field.
Thanks for the updates, really informative... I was working on OTP memory designs and this new time of glass memory is looking similar to the concept of OTP memory, may be we can see this kind of evolution in OTP memories side also.
These new analog memories can be super efficient for LLMs because we don't need exact values, just approximate ones, so each cell could store a weight. We still need the analog multipliers and then we'll finally have hardware neurons that will be way more efficient than the current systems which are solely bound by memory bandwidth and CPU power consumption.
For quantum computers, this problem is even greater. There, the area of the RAM "pixels" is huge for now. The speed of ordinary RAM is small for quantum gates/switches. RAM is not just memory, but arrays of NDR (negative differential resistance) counters.
when new stuff comes into use it's nice to hear how it works, how it was developed. thanks. i've been retired since 2005, when 3com's cowardly lion closed it's doors.
It should be mentioned that process node sizes like N3 or N5 nodes are density measurements and not actually a transistor size. Intel 10nm was equivalent to TSMC 7nm as they average over different area sizes and utilize different shapes and can't be compared directly or even with the size of a silicon atom which is only 0.1 nm in "size".
what really excites me about this new PCM technology is it's analogue compatibility. i really think APUs will catch on within the next 10 years or so. and this type of RAM is perfect for that application
I don't understand how chiplets are such a huge "revelation." I recall that the original Pentum II chip had Sram chips in the same package as the CPU. It wasn't as sophisticated as todays chiplets, but it is safe to say that the Pentium II can be considered a conceptual precursor to modern chiplet technology. The use of multiple chips within a single package to enhance performance and functionality laid foundational ideas that have evolved into today's chiplet architectures. While the Pentium II's design was focused on integrating a separate L2 cache with the processor in a cartridge format, it demonstrated the potential benefits of modular and multi-chip approaches. This concept has been significantly refined and expanded in modern chiplet technology, where multiple dies are integrated within a single package to optimize performance, cost, and scalability.
3 nm and so on is a marketing term that has no relation to any dimension of the transistors anymore. The true gate width until now is 14 nm due to asml's lithography machines limitation. The next step for the next decade is going down to 8nm (about 80 atoms wide).
Xilinx's (now AMD) HBM products were combining FPGAs with DRAM chiplets on a silicon interconnect substrate back in 2018. Altera released similar tech a year later.
My concern with the phase change memory is just the lifetime and reliability. Do the cells grow oxides or change chemistry over time? Can they be ruined by ripple or electrical noise at scale that hasn't been discovered yet? Etc. Love your videos!
That explains why chips started to stagnant then grow bigger in size instead of always getting smaller with more transistors. For the SRAM problem I am thinking it will most likely still the same size even with the new technology for the reason she said it needs to do the same thing to hold then retrieve data. The way I see it going is how AMD seems to be doing it. Said to the public as adding more L3 in a vertical stack which is alike to layering SRAM on top of each other instead of having a single 2D plane. That with most likely a way to have a key or frequency or whatever they are going to use inside the chip for when they found the data in the layer of L3. For L1 sadly I do not see it shrinking in physical area due to physics. Maybe they can redesign it again to be smaller but there is only so many redesigns before you hit a wall, find perfect or reduce efficiency.
OK I’m calling this out as not feasible in lots of cases. The issue is that SRAM needs to be tightly coupled into an architecture to get the performance benefit. However if a bond-out pad is required (eg chiplet etc) via Bunch Of Wires interface then there will be a delay penalty due to capacitance and transmission line issues. This means added latency and a performance hit. Might be useful for L2 cache but anything local it is of no use. SRAM at the local level is still the best solution.
I believe that the problem of quantum tunneling limitations to size can be addressed by temperature as well as ionic state. This implies that mechanical cooling is a requirement which is expensive and inefficient.
Great content! Please stop the zoom in/out effect when cutting your video. There is so much of it I get nauseous watching. Otherwise, you do an excellent job explaining this interesting topic.
Check out New ASUS Vivobook S 15: asus.click/vbs_anastasi
Please explain why we don't have quantum computers with Ning Li's room temperature superconductor?
@Anastasi In Tech - What about using i-squared-l logic and/or vacuum channel FETs, possibly on chiplets? I2L seemed very promising when first introduced, but it's power consumption was high since transistors were all large at that time. As a bipolar technology it will not suffer from gate leakage problems. Are there any other reasons why it might not work? As for "vacuum" channel FETs, they are 10 times faster or more, partly because they use free electrons. They also benefit from nanoscale features, are extremely radiation resistant, and they can operate comfortably at temperatures up to hundreds of degrees Celsius. Also they don't actually require vacuum when built at small nanoscales.
@@YodaWhat This is about Anastasi's Asus Vivobook commercial she boldly snuck into her main content?
@@fluiditynz - I left my comments and questioon here because it is the most likely place for her to see it. Nothing to do with the laptop she's promoting.
xoxooxoxoxooxox
When I first started programming, and RAM was off chip and typically a few KB, we'd spend a lot of dev time working out how to do as much as possible in as little RAM as possible and as few clock cycles as possible. These days the demands to cut development time and get new features out, more driven by senior management and Product Owners than by real customer demand, seems to have ditched those ideas. If it's too slow the customer is expected to just buy a higher spec machine and new developers are taught ways to shorten development time but not execution time. I think that this is a false economy. About 10 years ago I was able to shorten a big data-processing job from 3 days to under 20 minutes, on the same hardware, by applying the techniques I'd learned back in the 1980s to key functions. It took me 5 days, but when this is something that has to be run every week the saving soon stacks up
You are absolutely right. Once I participated in a service job to get a power station running. The problem was to bring the gas engines up and running as fast as possible. After a few days the programmer had been flown in and looked for alternative assembler commands to save a clock cycle here and a clock cycle there.😁
Wirth's Corollary to Moore's Law:
Any improvement in Hardware performance will be negated by code bloat at an equivalent rate.
Kinda like traffic in London.
It's not a false economy, just a different emphasize due to the change in price structure.
In the old days, memory were expensive, so we tried to economize its use. Today's memory are so cheap, that software developing time has become the most expensive part of a system.
@@gorilladisco9108 the cost of memory is largely immaterial. It’s the cost of execution time. Say you’ve got a transaction that currently takes 10 minutes to complete but if the code was optimised would take 7 minutes. To optimise the code would take the developer an extra 5 days effort and the developer earns £30 an hour (that’s the mid-point for a developer where I work), so that’s about £1100 wage cost but once it’s done that cost is done. Once rolled out the application is used by 200 people paid £16 an hour (I have some specific applications we use in mind here). Saving 3 minutes per transaction means either those same staff can process 30% more transactions or we can lose 60 staff at a saving of just over £7000 a day. That extra development time would repay in a little over an hour on the first day and after that would be pure cost saving.
NO COPILOT! NO RECALL! This future is PRISONPLANET!
You really have a knack for making complex topics engaging and easy to follow for everyone! Breaking down the challenges of SRAM and introducing phase change memory in such a clear manner is no small feat. Excited for more content like this!
👍🏽💚🌴☀️🌏
Has datsbus ended?
@@Raphy_Afk 😂😅no..my English is bad🐪☀️
@Magastz love💚and peace 🌏
Not bad on the eyes either
The problem with chiplet design is heat management.
Since every layer is active, it burns energy and produces heat, and this isn't good.
A secondary problem is the bus interconnect because stacking requires shared lanes, so memory layers are in parallel, making the bus interconnect a bottleneck.
Last but not least is signal strength and propagation time: stacking layers requires precise alignment and add electron jumping around, so there's a potential limiting factor in electron propagation, noise and eventual errors. This isn't much of a problem if the system is built around it, but it still is a limiting factor.
There are solutions: since there's one master and multiple slaves there's no risk of collisions and so you can make a lot of assumptions on the drawing board... but busses are going to become wider and more complex, and that will add latency where you don't want it.
My 2 cents.
- I wonder if they run veins of metal in between the layers to send the heat to radiator.
- They put L3 cache on the second layer, which by virtue is quite removed from the logic circuits.
Heat, latency, voltage regulation, signal integrity, etc…. Stacked dies has never been simple which is why there aren’t many of them.
Science communicators who actually are professionals in their field are allways welcome. Thank you Anastasi
I didn't even know she was from the field, I thought she was just smart. But I guess that makes sense
@@nicholasfigueiredo3171 Given the way she talks, I would had guessed her field was steering investors into various markets. The technical run down is useful, but the whole discussion is still clearly framed like "guess whats going to make a lot of money in the near future?".
She is a technology communicator. Learn what science is, please. I'm guessing you are a Republican, so I wouldn't expect you to understand the difference.
The point "good endurance 2*10^8 cycles" prohibits its use for cache memory. But it's really a viable and competitive option as a replacement for Flash memory!
Thanks. Amazing video. It's kind of interesting how it always comes down to the same principles. First shrinking the size in 2D, then layering stuff, and eventually going into the 3rd dimension. And when that reaches its limits, then change the packaging and invent some hybrid setup. Next, change the materials and go nano or use light etc. instead. Even the success criteria are usually similar: energy consumption, speed or latency, size and area, cost of production, reliability and defect rate, and the integration with the existing ecosystem.
А потом ещё уйти в 4 измерение:D
I'd be curious about the thermodynamic side effects of phase change memory during transitions as the crystallisation would release heat while amorphization would be cooling
I greatly admire the passion you infuse into your presentations. Your work is outstanding, please continue this excellent effort. Thank you!
You explain things so well, thanks for a well thought out presentation
Nice idea. Very similar to Nantero NRAM that also uses Van der Walls effect to provide resistive cells using carbon nanotubes for SSD/DRAM universal memory.
I've been waiting for NRAM for 20 years, and it is only now beginning to make it's way into the data centre. Let's hope that this technology takes less time to mature.
So, the two biggest old school technologies that are slowing progress seems to be memory and batteries.
Yup!
Also, a shortage of railways.
I worked on micron/intels PCM, optane, for a few years. While we were making peogress on some of the problems you mentioned, the venture ultimately failed due to the economics of producing the chips as well as a lack of customers. Would be cool to see it make a comeback in the future
I am shocked she failed to mention optane as well - "new technology" lol.
had they holded on till CXL was here imo it could have taken off, it had great promise it was just in the wrong interfaces
I thank you for your service. When intel announced that they were ending optane, I bought 6 of those PCIE drives; I caught a fire sale. Those drives are the fastest drives I have for doing some disk intensive Studio work. I wish they could've gotten the price down around $100-$200 dollars for the good stuff. I actually got 6 optanes for $45 a piece. I lucked up and bought a box.
Amazing!
This girl researched exactly what I wanted to know.
Thanks.
Been waiting for your vid.... Love the content
Very well explained. Thanks
We need more Journalism with clarity to present for the public the real challenges and advancements of Technology.
Very interesting, I like the way you present info clearly and concisely
Thank you Anastasi - great presentation!
Subscribed... Always interested in intelligent people. You understand what you are saying and are not just spewing words. Fascinating.
Very comprehensive and interesting video. Thanks Anastasi! 👍
My fave memory joke: Stand in the nutritional supplement section of a store and look at the products with a confused expression. When someone else is nearby, ask "Do you remember the name of that one that's supposed to help memory?"
Its interesting that you talk about your experience in chip design. Maybe you cold make a video talking about your experince in chip design?
Thank you for your presentation. I found it fascinating. The phase change memory, amorphous crystal back to uniform array crystal seems like the mental models used to explain demagnetization around the currie point.
Linked to my substack, title, "The very definition of brilliant" That meams you Anastasi. 😊
Very interesting. Thanks for sharing your expertise. There is always something interesting in your videos. At least in the three or four i have seen so far.😊
It has been discussed for decades that close stacking of chips has advantages of speed and size. The issue is heat generation, thus trying to reduce the total charge (electron count per bit). New memory technology is required with far smaller charge transfered per operation.
I used to use magnetic memory... when I worked on DEC PDP-8e. It was called core memory, you could actually see the core magnets and wires that were wrapped around the cores.
The words "dynamic" and "static" are a reference to the powering method between state changes. You kind of hinted at this with the TTL logic diagram, but didn't expand. Static is faster because it doesn't have to wait for the re-fresh cycles before it can change state. Static also runs hotter and consumes more power- there are no free lunches ;-)
Not exactly. DRAM consumes power all the time, because it needs constant refresh to preserve contents. SRAM only consumes power during state change. Both consume some leakage current though, and with that, SRAM consumes more due to having more transistors per bit cell. DRAM also consumes considerable current to change state, because of its larger gate capacitance. Overall, DRAM tends to consume more power per bit but costs less and is more compact, which is why we use it for main memory and reserve SRAM for cache and internal registers.
That was a great video very informative. You're right, it is an exciting time to be alive with all the evolving technology.
Well said. Excellent video Anastasi!
Your channel has really improved over the 2 or so years Ive followed you. Im impressed!
Thank you for being here
PCM memory chip technology has been in R&D since the mid 2000s. Intel, StMicroelectronics and Ovonyx were in the game together in a joint development starting around 2005. Samsung was also doing research in PCM. I believe the biggest player now in Micron Technology.. And you are correct about all the advantages of PCM. I believe the two big challenges are being able program the device.into two or more distinct, well defined resistance states reliably coupled with manufacturing very small structures with precise dimensions. Nvidea is talking about PCM.
This sort of tech is very interesting, because depending on how it advances, it stands to change the computing landscape in one or more different ways. If Phase-Change Memory is fast enough and gets good enough density, it can replace SRAM in L3 cache. If the speed cannot get high enough, it could still find use as an L4 cache or a replacement for DRAM. If all else fails, I bet it could give Flash storage a run for its money.
Great stuff. As someone who built their own desktops through computer conventions in the 90s I appreciate you bringing me up to date on where we stand now in personal computer development😊
This helps me immensely with my DD into the tech & companies involved in the memory sector, Thank you very much Anastasi!
Stacking silicon...who woulda thought ...now it makes perfect sense for chip real estate. Thank you for your brilliant assessment of the latest chip technology. You have expanded my knowledge regularly.
😮😮😮 Really liked this info. I'm formative to the point and exciting.
This was an excellent and very informative episode!
In addition to learning heaps about memory, I really enjoyed hearing you say SRAM lots.
It's quite bizarre that you thought the PCM memory is a future replacement of SRAM, as the it has a switching speed of 40ns (on par with DRAM), according to the paper you cited. This is an order of magnitude slower than SRAM. The current only viable option to replace SRAM is SOT-MRAM, which TSMC is working on. Go research SOT-MRAM😁
It is good enough for cache application but very bad for register memory.
It also involves a physical change to the medium, which means wear and limited number of writes.
I believe a similar principle has been around since at least the 90s. I used to have a CD-R/W type device that used a laser to heat up spots of a special metallic medium, changing it from smooth to amorphous. Could be rewritten some number of times.
I will say though, your point is probably good and valid, but could have been made more constructively.
@@kazedcat its not good enough for cache, modern caches are at most in the low dozen of ns, 40ns is DRAM levels of latency
This is true. PCM is totally useless as SRAM replacement and doesn’t have sufficient speed or rewrite resilience. Honestly, she really failed to understand its use case. It’s a great alternative to floating-gate FLASH memory, not SRAM!
what about 4ds memory? 4.7 nanosecond write speeds
Paul Schnitzlein taught me how to design static RAM cells. This video speaks to me. Yes the set/clear, and sense amps are all in balance. It is an analogish type circuit that can burn a lot of power when being read.
loved that memory zinger, ur so awesome!
Another banger video. Do you have discord channel to reach out to?
telegram probably if shes russian
NO COPILOT! NO RECALL! This future is PRISONPLANET! NO WORK NON-STOP!
I love your explanations. Nice work! 👍
Thank you for this video. It's great. My two issues: (1) heat dissipation, is not addressed (over cycles there is growth of H.A.Z.), (2) One thing I heard about and remember vaguely, was an attempt at self healing logics (rather, materials + control circuitry), which is aimed at reducing the need for redundancy, in elements at the core of the chip (hottest and fastest environment), and attempts to also better the chip lifetime (cycles 'til dead). -I would be grateful if you could address both.
Loved the graph you put together with the memory pyramid (access time vs where is used, with volatility information)!!
P.S. Your accent also becomes more and more easy to understand!
Smart and beautifull, well this is something new.
I hope you become a trend , so that our kids can stop follow brainless influencers
Thank you for sharing this new & exciting development 😊
Thank you! Just wanted to say you have a mistake in the figure you show (e.g. 12:38) labelling the latency of flash memory as milliseconds (1/1000s) when, as you say in the audio, the latency is in microseconds (1/1000000s)
One of the chief benefits I can see in going to optical computing is the ability to have associative addressing through polarization and muliple concurrent optical reading/writing heads for raid like processing.
As always fantastic work. I am not so enthusiastic right now with the new technology an endurance of 2E8 is amazing for something like storage, but the computer will go over that in no time for the cache. Even a microprocessor that is not super scalar and runs on the ghz range will be accessing memory in the other of 10^9 per second. Clearly, that access is per cell, and not for the full memory but they need to improve that number a lot.
I remember hearing about the SRAM scalling issue some time before the Zen4 release, but then haven't heard anything even though I kept hearing about shinking nodes. Been curious what was coming of that. I was thinking that since it's not benefiting from the scaling, if it may have been counterproductive regarding degradation etc. I wonder if that is what is happening with the Intel 13 and 14K skus? I guess we will find out soon enough. Thanks for the update, I'm glad they are on top of it!
This does remember me of a mechanical (robot related) movement solution.
They used the same idea in a mechanical way.
It works like muscle cells.
Suggest captions. I think I’d like Anastasi in Tech even more.
This was an unexpected good video. This is my first video watch of the channel.
This is an excellent explanation of the current state of IC memory. Thanks.
I believe that down the line we would need to use another processor architecture than the Von Neumann one that we use today (i.e. having logic and memory separated), an architecture that instead has an "on memory compute" design, or perhaps a mix of them.
In the end the speed of light makes it hard to compute over longer distances (i.e. CM or even MM) specially when the frequency goes up and the data becomes even larger.
So basically smart RAM chips with shaders?
Thank u for pointing this out!👍Not just on chip SRAM memory, but operating memory in general has a lot of catchup to do with the compute logic not only because of the limitation of further shrinking SRAM and demand from the AI workloads. Operating memory has been historically left behind the compute logic and in a way ignored "nature's" way of things (brain neurons) by being the same and as fast as compute/processing while having sufficient capacity. Maybe PCM or other memory technologies will deliver that in the future, however I agree with u that L1 cache will most definitely continue to use SRAM for the foreseeable future and L2/3/4 with larger capacities will most likely go first with the stacked SRAM before moving to new technology like PCM or resistive memory.
Although I do not comprehend all the things you mentioned, what I do understand I find very fascinating. Yours and videos of others help me to decide on what companies and technologies in which to invest (= gambling) at the Wall Street Casino. Investing in stock is like playing Black Jack. The more you know such as via "card counting", the better your chances of winning. For me, your advice is akin to card counting when it comes to gambling on stock purchases. Thanks for your insight in this realm.
BTW, my 1st computer was an Atari 800XL which I purchased in 1985. I also wrote code in Atari Basic and in HiSoft Basic. Ten years later, I used the program I wrote to analyze the data for my Master's degree in Human Nutrition. With the Windows computers, writing code now has become too complicated for me, so I have given up on that endeavor.
I'd love to see a AIT and High Yield collab someday :D
Great information 😊
The BCM2837 SoC chip uses stacked RAM. The Raspberry Pi Foundation released the Pi Zero 2W in 2022 using it. So who stacked first? Regardless of who, it’s great to hear the designers are finding solutions to such huge (microscopic) problems!
You are an amazing Vlogger and i love your accent :D
Well done excellent video and very informative 👍
Cool video. Perhaps in some future, memory is controlled by shadow? 🤔
Great explanation
thank dear, its informative
I had thought of building memory (and the whole IC) in 3D 10 years ago. I think I even put the idea in my website years ago. One part of my idea that is not used yet is using microfluidics to cool the chips that are stacking transistors in 3D, thus restricting heat transfer. The channels could run many levels, and of course, they need fluid-tight connections (a big problem). And use optics to communicate instead of a BUS. Possibly LED or laser tech.
I worry about using non-volatile memory for primary or cache memory because of the security aspect. If the information remains after power is interrupted, quite a few "secrets" will be in clear text, and the determined and well equipped "bad actor" will be able to extract surprising amounts of information from a system.
My industry has to issue letters of volatility with everything we produce, and for anything with NVM, the sanitization procedure usually involves removing the part with non-volatile storage and destroying it. The only exception is when it can be proven that the hardware is incapable of writing to that NVM from any component present on the assembly, even if malicious or maintenance software is loaded onto the device. This phase change memory built in the same package as the CPU logic could not be provably zeroized without some sort of non-bypassible hold up power, and that would increase the cost and size of the chip package.
I think this is very promising for secondary addressable storage, but I don't see it replacing main memory in most applications.
Ms. Anastasia is so lovely, hard to concentrate on her narration let alone it's not easy subject to understand.😊
Excellent analysis 👏🏾 👍🏾 👌🏾
One way of attacking the Memory Wall hierarchy is to attack it from the top, use RLDRAM which has been around for >25 years but only in NPUs (network PUs) since it offers DRAM cycle rates closer to 1ns but latency of 10ns or 8 clocks. Since it is highly banked, 16-64 banks working concurrently allows for 8 memory accesses every 8 clocks so throughput is 2 orders better than conventional DRAM. Of course in single thread use, not much benefit and to keep as many threads in flight requires that thread selects pseudo randomly across the banks and not hit on the same bank successivly.This could be used as an extra layer between normal DRAM on slow DIMM packages and the first SRAM cache level. This RLDRAM layer is where it would be used in CAM modules or soldered. We are substituting The Memory Wall for a Thread Wall here. But we already are used to having dozen threads these days. The RLDRAM model could be applied one level lower down in an RLSRAM version which would be perhaps several times faster but allow bank cycles and latency near 1-2ns but still 8 clocks and 16 banks.
So fancy! I think I want that laptop
Many years ago I wondered why transistors and memory were not stacked in 3D in layers. I figured it was because of heat. My solution to that was microfluidics and possibly sodium to carry it away. I also thought light pipes (lasers) could replace the metal bus.. A lot of work to make it to production as the hardware needs to accommodate new kinds of connections.
Nicely done.
Non-volatile and low-latency at the same time, coupled with scalability and hopefully cost-effectiveness in manufacturing, would be a huge technological leap. Thank you for the information.
Great video - thank you Anastasi :-) I think if we stack much more memory as 3rd level cache chiplets on top of CPUs we may reach the size of gigabyte 3rd level cache. And this would eliminate the external DIMMs on the mainboard which makes future Notebooks and PC again cheaper and reduces not just the complexity of the mainboard but also of the operating system, drivers and firmware because data can be loaded directly via fast PCIe lanes connected SSDs to 3rd level cache.
I very much appreciate your videos and recommend them to every engineer I know !!
Thank you
the NVMe slotted in the DDR5 slot - direct access storage - skipping a part of memory all over.. the system boots from storage/memory - slot 2,3,4 are real RAM
just a dime throw
I love the way you explain the topic it gets me thinking even though I have no idea. Like possibly folding the memory and interconnecting them to form cubes cause I always see dies represented in 2d. Like I said, not my field.
Thanks for the updates, really informative... I was working on OTP memory designs and this new time of glass memory is looking similar to the concept of OTP memory, may be we can see this kind of evolution in OTP memories side also.
These new analog memories can be super efficient for LLMs because we don't need exact values, just approximate ones, so each cell could store a weight. We still need the analog multipliers and then we'll finally have hardware neurons that will be way more efficient than the current systems which are solely bound by memory bandwidth and CPU power consumption.
32/64bit APUs and 1bit LLM are a thing. AI is done on GPUs anyway, or better TPU/NPU, so we already have chips dedicated for it.
For quantum computers, this problem is even greater. There, the area of the RAM "pixels" is huge for now. The speed of ordinary RAM is small for quantum gates/switches.
RAM is not just memory, but arrays of NDR (negative differential resistance) counters.
Cooling the buried cores may present a problem in the future
Thanks!
Thank you
It's incredible how realistic AI creates movies. You can fall in love.
when new stuff comes into use it's nice to hear how it works, how it was developed. thanks. i've been retired since 2005, when 3com's cowardly lion closed it's doors.
It should be mentioned that process node sizes like N3 or N5 nodes are density measurements and not actually a transistor size. Intel 10nm was equivalent to TSMC 7nm as they average over different area sizes and utilize different shapes and can't be compared directly or even with the size of a silicon atom which is only 0.1 nm in "size".
You are brilliant! Great content. Thanks for this. ;)
The content, awesome. The jokes, not so much, lol. Thanks for sharing!
what really excites me about this new PCM technology is it's analogue compatibility. i really think APUs will catch on within the next 10 years or so. and this type of RAM is perfect for that application
I don't understand how chiplets are such a huge "revelation." I recall that the original Pentum II chip had Sram chips in the same package as the CPU. It wasn't as sophisticated as todays chiplets, but it is safe to say that the Pentium II can be considered a conceptual precursor to modern chiplet technology. The use of multiple chips within a single package to enhance performance and functionality laid foundational ideas that have evolved into today's chiplet architectures. While the Pentium II's design was focused on integrating a separate L2 cache with the processor in a cartridge format, it demonstrated the potential benefits of modular and multi-chip approaches. This concept has been significantly refined and expanded in modern chiplet technology, where multiple dies are integrated within a single package to optimize performance, cost, and scalability.
3 nm and so on is a marketing term that has no relation to any dimension of the transistors anymore. The true gate width until now is 14 nm due to asml's lithography machines limitation. The next step for the next decade is going down to 8nm (about 80 atoms wide).
Xilinx's (now AMD) HBM products were combining FPGAs with DRAM chiplets on a silicon interconnect substrate back in 2018.
Altera released similar tech a year later.
My concern with the phase change memory is just the lifetime and reliability. Do the cells grow oxides or change chemistry over time? Can they be ruined by ripple or electrical noise at scale that hasn't been discovered yet? Etc. Love your videos!
That explains why chips started to stagnant then grow bigger in size instead of always getting smaller with more transistors. For the SRAM problem I am thinking it will most likely still the same size even with the new technology for the reason she said it needs to do the same thing to hold then retrieve data. The way I see it going is how AMD seems to be doing it. Said to the public as adding more L3 in a vertical stack which is alike to layering SRAM on top of each other instead of having a single 2D plane. That with most likely a way to have a key or frequency or whatever they are going to use inside the chip for when they found the data in the layer of L3.
For L1 sadly I do not see it shrinking in physical area due to physics. Maybe they can redesign it again to be smaller but there is only so many redesigns before you hit a wall, find perfect or reduce efficiency.
OK I’m calling this out as not feasible in lots of cases. The issue is that SRAM needs to be tightly coupled into an architecture to get the performance benefit. However if a bond-out pad is required (eg chiplet etc) via Bunch Of Wires interface then there will be a delay penalty due to capacitance and transmission line issues. This means added latency and a performance hit. Might be useful for L2 cache but anything local it is of no use. SRAM at the local level is still the best solution.
I believe that the problem of quantum tunneling limitations to size can be addressed by temperature as well as ionic state. This implies that mechanical cooling is a requirement which is expensive and inefficient.
Damn it! I thought of this back in 1984 and was told it wasn't possible... 😢
Great content! Please stop the zoom in/out effect when cutting your video. There is so much of it I get nauseous watching. Otherwise, you do an excellent job explaining this interesting topic.