Greetings from the Intel 20A BPD team! I was involved with Blue Sky Creek (BSC) in its early stages, but was moved from the team to work on 20A proper before they hit the labs post-silicon. I did my PhD on the development of modeling and optimization methods for chip to chip power delivery, focusing on multi-chip-module designs and finishing in 2020. Feel free to ask me anything about this new technology and I will happily share what I can.
@@HighYield I could nitpick stuff for sure, but you didn't miss much. One thing that would have maybe been good to mention as I see a lot of comments asking about it or maybe misinterpreting it is the thermal consequences of the new design. With the transistor layer moved into the middle, it is true that you have to pull the heat from them through more silicon, but the reduced waste heat from other parts of the die such as the significantly lower resistance in the power vias more than makes up for it. I think we measured something like 20-30% lower resistance in some scenarios, which means a lot less voltage drop from contacts to transistors. That power loss from resistance is a significant portion of the heat generated by a modern chip. I can't say exactly how much, but ARL could actually use that to see a reduction in max power for the first time in a lot of Intel generations.
I've got a couple of questions for you. I'm currently studying computer architecture so i might just be missing some common knowledge about manufacturing processes 1) If the pin connections are now on the backside along with the PD network, how do external data signals connect to the signal network? 2) is more material lost/wasted due to adding a new carrier wafer and removing the old one? (8:50)
@@parsnip908 Power and I/O are still routed to the same side eventually. This isn't reflected well in diagrams because you have to cut away somewhere. Most of what is on the frontside is the communication network on the chip, as this is what really dominates those low metal layers and gets in the way of other stuff. BPD should really be called BPD&I/O. You can still mount the chip in a BGA form factor like normal. There is technically more waste material as you use 2 wafers in the production process, but since the yield can be quite high and the pitches in the metal layers relaxed, you can actually save money in the production process. This is part of why 18A will be offered externally after the technology proves itself on 20A.
Small clarification regarding the M0, M1, M2, etc labeling. Using the BEOL image from wikipedia at 5:45, you identify and highlight "M0" as the lowest metal layer in the BEOL steps. In your image, M0 would actually be the tungsten metal, which is identified as part of the FEOL processes. M0 is often called a "local interconnect", because it is a metal layer that is laid directly on the Si surface, and is used to connect immediately local transistors together. For example, in an Inverter gate with 1 NMOS and 1 PMOS, where the Drain of each transistor is connected to the same node, an Local Interconnect would be used to connect the Drains, by depositing a M0 layer directly on the Si, contacting the doped areas of the transistors that form the Drain. M0 is made of metals like Tungsten or Ti Nitrite, and not of the typical Copper or Aluminum metals used in higher layers. So M0 is a "special" metal layer used for local interconnects, because it uses different material and has to be deposited and formed in different ways than the other metals, because it's in direct contact with the Silicon. It's also used as the "Contact" metal for higher metal layers that need to reach the Silicon. M0 is the only one that actually makes contact with the Silicon, being deposited directly onto the Sources and Drains of the transistors to form vertical contact structures. Then, when the wafers move to BEOL, the first "typical" metal layer, M1, is made of Copper and it goes down touches the top of the M0 layer. In your BEOL image, you can see that the orange metal layers are labeled Cu1 (Metal 1), Cu2 (M2), up to Cu5 (M5). Each of these layers really comes in a pair of layers, because each Metal layer requires 1 layer for "Vias", which are the vertical structures that make contact between 2 metal layers, and 1 layer for the horizontal metal layers themselves. In the image, "Cu1" actually includes "Via1" (which is the vertical contact between the top of M0 and the bottom of "Copper 1") and "Copper 1" (which is the Metal 1 layer that makes the interconnects between blocks). Both Via1 and Copper1 each require their own photo masks and process steps. Then "Cu2" is actually "Via 2" (the vertical connection between the top of Copper1 and the bottom of Copper2) and "Copper 2", again each requiring their own masks and steps.
Yeah... yeah it's crazy. It's a feat of engineering in mass production. It's to the point where technology reaching its limitations so they have to precisely improve everything as possible to get performance.
When I hold a chip when I'm building a computer or something, I get the chills. This is the closest thing to "alien technology" we have IMO. The fact that you can get such an artifact for some hundreds of dollars is insane. Not to mention the enormous use you get out of it.
In the '90s, My Granddaddy had the same awe & amazement as he told me about the very first radio he ever witnessed, before WWII. "It was a machine that stood in the floor, right in front of an open window. It had a wire attached to the back of it that went out the window to a pole. With nothing else attached to it..." 🤔😳
@@lilblackduc7312 it's AM radio receiver. yes you can power on these radio only with antenna, no power from outlet needed if the antenna wire is long enough. my electro teacher back in junior told us that the wire at least as same height as coconut tree to receive signal from AM radio in my local area.
I work on Global Foundries' 22FDX process node which is a 22nm FD-SOI process for RFIC layout design and I gotta admit, this would be ridiculously useful in ways I can't even describe for the RFIC or Analog industry. I imagine it'll get significantly more useful the smaller your process node since the resistances get real high real quick when your metal connections have to be incredibly thin and the vias are tiny. If I could put my thick power delivery wires on the back and not have to share the area over a bank of devices with the data wires my life would be a hundred times easier and I could work at twice the speed. This could even allow some sort of automation for the power routing. Everything would become significantly more power efficient and thermally ideal by doing this too. I must admit, I was rather surprised to find there was only one bottom layer metal (M0). In my process node we have M1 and M2, before going to higher layer block routing metals. I guess it makes more sense for a larger process node and for RFIC design
I can't say a whole lot right now, but I will say that there is active development for the type of power routing tools you're talking about. At the very least there is talk of offering assisted power routing for some 18A customers.
I keep on wondering: wouldn't it make sense to try to back port some of these Backside Power Delivery approaches also to older / larger nodes, like the 28 nm & 22 nm nodes which are still largely preferred for economic designs, especially considering that BPD promises both cost & power advantages? (Of course assuming that customers of these nodes are willing to invest in new layout design methodology & update their cell libraries)
You are correct! I have been so happy I found his videos shortly after he started. I have always been impressed and he quickly became my favorite hardware information resource. I'm so happy to see him getting the appreciation he deserves! It's amazing how fast his audience has grown! As well as how he hasn't let it go to his head. You can tell he does it because he likes it. None of that arrogant "blah blah blah join this community (based around myself!!)" nonsense. No pumping out videos just to game the algorithm etc. Just good information, excellent insight, and straightforwardness!(not sure that's a word, but oh well). The only problem with him not kissing thw algorithms butt is that YT almost never notifies me about his videos even though I am subscribed and have all notifications on. Anyway, enough kissing his ___😊
Absolutely insane. I had to pause at some of the slides just to stare at the complexity. Every time I say "that is surely it. They can't go further than that!" a new development blows my mind. Bring on the sandwich era of chips!
Amazing. I also think having the metal layers for data and power separate could allow for easier "glueing" of smaller dies together to form a bigger chip as you can route the I/O on top without going to the package substrate or even solder bumps. Glue the chips together, then etch the hermetic seal and build yet another bigger metal layer on top of the smaller glued dies to make the I/O path even shorter. Thus you can make a huge die but with smaller yields. That wasn't possible with flip chips, but now one could imagine that being possible.
Another fantastic video man👏🏻👏🏻You’re literally one of my top 3 favorite tech channels, great combination of in-depth knowledge and understandable delivery! Please never stop making these quality videos!👍🏻
I do have to wonder about thermals and whether that will limit the benefits for P cores. The main heat generating component is the transistor layer and this method puts the signal layers in between that layer and the cooling system. People used to polish down their CPUs just to reduce this distance by several nanometres and improve performance so I don't think it would be insignificant
Everything you said makes complete sense from physical point of view, but they did claimed 6% frequency increase so lowered power losses combined with better signal quality must completely offset higher thermal resistance. After all all that wasted power in classic design was ending as heat and not optimal signal network meant they had to use more power to hit frequency target.
@@sznikers Pretty much exactly this. Cleaner signals means you don't spend as much power cranking your transmit and receive points, and you lose less to resistance everywhere since the paths can be direct and physically thicker.
Current intel method is to push the cores highest you can do, with BOD&IO , you have to find a good balance b/w how much power you can push vs what perf target you want. It an added constraint to manage.
As well as increases power capacitance very near to the transistors, allowing lower noise. Larger surfaces of power planes are free massive capacitors. Nvidia used available space in the metal layers to make capacitors to reduce power noise. Add to that, keeping (now lower) power noise away from the signals will help too.
Lifelong learner, first time down this particular rabbit hole. Comparatively, this video worked for me. Nicely highlighted the current design limitations theoretically and advantages of new deslgn. First lightbulb was which is frontside vs backside
You didn't comment on heat dissipation, as there will be more layers on the bottom of the trasitons it could become more difficult or even more heat, I hope more videos like this, thanks for the content!
Intel said there isn't much of a difference and the increase in efficiency (less energy lost to resistance = less heat) makes more than up for any decrease in thermal capabilities.
I just saw someone adding cooling to both sides of a motherboard which didn't make much sense to me at the time. And the POC didn't work very well. I would think that the power delivery side would need the most heat dissipation. This seems super promising.
super technical video, presented in an excellent manor which was dense with actual information... too often, people make videos where they TALK a lot of words, but, dont really SAY anything ! great video
Super curious about the primary heat path in this configuration. It seems like it would have to be down into the power layers and ultimately into the substrate and PCB.
The Chipmachine factory ASML have been/are redesigning some parts that I/we make as their supplier. one of the assys was pretty much flipped upside down. but the changes are only expected to hit production in a year or two. Fun to see this kind of tech developments reflect on things that I see happening at work. Because as a supplier of parts you never get the full picture of what they're actually doing.
Yes, I believe Intel is looking at making stacked "CFET" devices, which effectively form a vertical NMOS/PMOS pair that can be connected on both sides.
What about cooling? Having transistor side closer to heatsinks significantly improves heat transfer I would imagine. Considered they are pumping hundreds of Watts into their chips, it sounds like it would actually decrease performance as for e.g. "turbo boost" wouldn't be able to clock as high.
The gains from the lower internal resistance pretty much completely offset this. Blue Sky Creek saw a 5-10% clock speed increase over production-volume Intel4 node chips. For an example, if Meteor Lake were using this test node, you would see boost clocks about 400mhz higher on the P-cores and 250mhz higher on the E-cores, which doesn't sound like much, but it is done at the same power draw.
Great video :) I was wondering, why don't they create the signal layers first, then the transistors and then the power delivery? That way you would avoid grinding down to the transistors and adding structural support again.
that would need an entire redesign on how the silicon litography part works I think, wich would lead to many new costs for validation and R&D necessary
I think the transistors themselves need the silicon wafer, they can deposit a metal layer on top of silicon but not a silicon layer on top of metal, my best guess, not an expert
@@eggnogg8086 @jorenboulanger4347 - that's right. The silicon substrate (the wafer itself) is the magic that makes semiconductors work. Transistors have to be built directly into the silicon itself. Then, metal layers are deposited on top of the silicon. So the FEOL processes that @highyield talked about embed the transistors directly into the silicon (and with 3D transistors like FinFETs and nanosheets, there is some deposition above the Si as well, but the point remains, the transistors are build "in" the Silicon). Then, we need to use metal to connect the transistors together. This is the BEOL processes, which start building layers of material on top of the silicon surface, including metal and oxides and insulators, etc. So, essentially, the FEOL processes are taking a bare silicon wafer, which is a nearly perfectly pure, crystalline substrate, is required to build the transistors, then metal layers are added on top, which can be deposited using other processes. The whole process starts with a blank Si wafer, so the transistors have to be built on it. You can't, for example, built a bunch of metal layers first on top of the wafer, then build the transistors on top of those metals, then add more metal on top of the transistors, because the transistors have to be built directly in/on the Silicon wafer. So if you're starting with a Si wafer, you have to build the transistors first, then add metal layers on top. Then flip the whole thing over, grind the backside of the wafer down so the bottom of the embedded transistors are nearly exposed, then build up another set of metal layers, creating a metal/silicon/metal sandwich
Thank you for such an interesting video. A question, do you know if power via will help to scale i/o so analog parts of Chips further down again, as new manufacturing nodes only scale compute further down and analog and sram are lacking there. Is power via helping here in better density again?❤
This is the best, most clearly explained video on the topic of BSPD that I've seen so far. Even the manufacturers themselves do a worse job of explaining it. Good stuff.
Why is a BSPD not flipped at the end? It gets flipped to build the backside on top of the wafer and should then be flipped again to connect it to the substrate or what am I mistaking? (Timestamp: 11:05)
Duplex is in regards to the frontside (= the top of the wafer) facing downward. With BSPD the backside is down and the frontside is up. So it’s technically not in a flipped position when finished. It’s not about if you flip the chip during manufacturing. Maybe I could have explained it better.
Great article, question on the backside IO routing, obviously the IO's need to connect in some shape or form to the Top Side signal routing, How is this performed, you would have to route from backside through the silicon and out to one of the top side metals? I can't visualize this and at what top side metal layer does the IO connect to? Is there one large via through from back to front?
The frontside doesn’t need to be connected to the PCB, as it only handles chip internal communication. Power supply and I/O is routed through the backside, which is the side connected to the PCB.
स्पष्ट अवलोकन , प्रज्ञावान विश्लेषण और बहुत सुंदर समीक्षा कुलमिलाकर अर्धचालक यंत्र के विद्युत और संकेत वितरण प्रणाली और तकनीक के विषय पर बहुत सुंदर प्रस्तुती। 👏👏👏👌👌👌
Does it have a usage in complex chip designs or just small low power chips? How will You use active cooling on chips where You have power delivery with higher voltage under the chip and extra less thermal conductive layer of wafer on top?
I'm very happy to see this video. Slight bit of criticism - give yourself time to speak slower sometimes. I'm around 7:56 and it is hard to understand you, it almost feels like you're trying to speak as fast as possible. I don't think it applies to whole video, but it does to some segments, so in the end if the video was 20 seconds longer it wouldn't hurt that much. But your content is top notch - and indeed problem of power delivery and data transfer, wiring etc. is a problem that is constantly evolving. Meaning that even if you find a solution, often people will take it for granted and within couple of years you will have to find other solutions to very similar problems. And yes, half of the evolution of microchips might be smaller nodes, but the other half is how to get them powered and linked. Very underappreciated topic, so I'm glad you're covering it! Good Job! 👍
I played the video at double speed and had no problem understanding High Yield. I think that he articulates very well for a non-native speaker. Just my 2 cents.
@@aapje Agreed. It was only around the time of timestamp - first few sentences in chapter "Backside Power Delivery manufacturing" that I had trouble with. Though I'm not native speaker myself - and I wouldn't be able to follow half of RUclips at "2x speed".
On the old rules Meteor Lake cost at risk, ramp, peak, run down per unit cost will progressively move down from $227 per unit cost at risk production moving down through peak and just past peak to $88 marginal cost per unit at the 36th millionth unit of first gen disaggregate SIP production. The question is where is learning right now on realizing those marginal cost reduction objectives. mb
To clearify, this will not affect how these get connected to boards. Correct? It will however mean growing a layer of silicon on the top ofnone of those metal layers. I wonder how that will work out. Im Only used ised to starting with one wafer then building atop .
1:40 what if you do like a PCB and make double sided chips, or 3 layer silicon (or silicon substitute) with power delivery provided from said middle layer Now take amd Ryzen, instead of just using the top surface, you could put another full Ryzen on the backside, and use a sort of middle interposer for power delivery, and use some custom logic to simply run it like a multi core CPU, with hyper threading Now you realistically can get 64 core CPU or higher Same with a gpu
In effect you split die thickness in half, with a center interposer that provide syncing and glue logic, and can still achieve backside power delivery for both physical units
It'd be a weird looking package would not be compatible with current sockets due to cooling system changes, in this context you'd probably would want a heatsink dissipation capacity on its own of 300w and higher with liquid cooling The chip package probably would resemble a plcc style chip with the pin or contacts around the edges and towards the sides Because both sides are used, that would allow you to mount the chip on a riser card to facilitate cooling on both sides, but increase in space and overhead
Call it Full silicon fabrication Already effectively 3d print certain chips like dram anyway to get storage density so making layered silicon PCB shouldn't be a wild idea Because that's all a chip is, a PCB On a PCB, connected to other PCB Each at a different scale You just shrink the PCB down so you only can see it under a sem or tem photography
Appreciate the video, how does this approach translates to clock speeds? If the power network is longer does that mean lower clock speeds? Also if it has so many advantages and even cheaper to manufacture why this hasn't been done before? 17:30 in this frame you can clearly see Mitsui, I had no idea they made semiconductors as well. Are Japanese also at the cutting edge of semiconductor manufacturing?
The reduction in voltage drop leads to a cleaner and more stable supply of voltage to the cores, which results in higher clock speeds. Just like (on a different level) a better and more stable PSU can also help stabilize a overclock. It hasn't been done before, because up until recently you didn't really need it and frontside power was good enough. Only over the recent years the metal layers have become so complex, that it made looking for a solution a viable option. As for Mitsui, I think we are looking at a photomask or a cover for one. It's a mask for Intels Meteor Lake.
Given the significant benefits of backside power delivery outlined in the video, how might this technology influence the future landscape of consumer electronics, especially in terms of device miniaturization and energy consumption?
And then we can start playing with architectures that have top layer "master" cores, in setups with bottom layer sets of 2,4, 8, 16 small "slave" cores.
Since the first wafer is removed it could be thinner to begin with, saving wafer cost and time to grind it away. Also the final structural wafer wouldn't need to be suitable for transistor production, allowing cheaper materials (failed wafer recycling?) or even something with better thermal transfer.
Samsung's isn't far off from PowerVia, but it would be an interesting video to compare the 2 once both are on the market. Hopefully we could get a comparison of 2 actual dies and see what makes them tick.
I actually did my PhD on pretty much exactly this. MCM and BPD solve completely different problems, and in fact you can use BPD to boost an MCM design. So it's quite the opposite really. The 2 technologies work very well together. MCM allows you to scale beyond the reticle limit without going to complex methods like pattern stitching, which is how Cerebras makes their wafer-scale chips. You see this pretty well with Nvidia's B100 and AMD's Mi300 series chips, which this channel has a fantastic video on. It can also be used to scale beyond a practical limit with yields or cost, which we see with AMD's Ryzen and Navi31 and 32 dies using older, more mature nodes for I/O functionality, or with Intel's Meteor Lake optimizing different dies for just one specific task. BPD works within the die itself to free up the internal signalling and external power and I/O networks from each other, giving more space to optimize both and lower internal resistance with thicker wiring for lack of a better term. This helps your MCM design as your chips have to communicate somehow, and that freed space on the power & I/O side means better links between dies. You can mount a bunch of BPD chips on an interposer to make an MCM design just like traditional dies. Intel has that in the pipeline already for some 18A-based server chips. You can even apply the same principals to that interposer if you want, using multiple layers for die-to-die vs external communication and package-level power.
Im curious where you got the image you used for the thumbnail? Was this an AI generated image, or did you find it somewhere? Could you share a higher res version?
@@HighYield I really like the look of it as a stylistic reference. If your could share that would be great. It's just hard to make it the details at such a small size.
@@HighYield very nice! which AI did you use? i gave a quick try with Adobe Firefly and it looked like crap. This is super clean and detailed. Thanks for sharing!
You don't need to make that many connections for power delivery as for signal, so much less "power" wires go around to the other side. And having signal and power separated of course decreases interference between them.
There are only external contacts on the backside. The frontside is now entirely dedicated to internal routing. It should be called BPD+IO. Your current chips are face-down, these will be face-up, so technically no longer flip-chip.
It so easy in words, but... how to detach the nanometer-scale thin layer of ready made topology from the base crystal (remember - this is SINGLE crystal) and flip it over???
power efficiency allowing for higher clocks with less voltage means less heat but also mean for more overclocking room for higher cooling solutions so overall more it scales to be the same as todays you get the best speed you can for the temperature you can maintain. what it really means is devices are gonna get even faster and efficient.
one variable i would like to know is what part of the die generates more heat than the others but am assuming it would be the power delivery and i/o but if this design improves those power lanes I think that might improve the heat waste generation am not sure of those detail. but would be nice to know.
I am not a qualified semiconductor expert, but assuming that the total die thickness stays the same, I would think that backside power delivery would improve temps, and not only because of the improved power efficiency. The transistors have been moved slightly closer to the IHS, and they have the copper signal wires between the transistors and the IHS conducting the heat towards the IHS. on top of that, the larger power wires may also improve temperatures due to their increased thermal mass.
Maybe this way will make it more clear: there are 3 types of metal connections, transistor-to-transistor, power-for-transistor, and transistor-to-IO. Since transistors are on one plane, so marjority of the metal layer connections are transistor-to-transistor. What back-side power delivery do is, put all transistor-to-transistor metal layer to the front side, put power delivery layer and IO metal layer to the backside. Then front side will facing your heatsink, backside will connect to micro bump eventually goes to your motherboard.
Complexity and costs skyrocket with EUV nodes and the even smaller feature sizes needed novel metals solution that was becoming too hard to achieve a balance for data and power signals.
Great summary and explanation. In addition to backside power delivery, I think we have already seen another thing that helped with segregation of data/memory and power. That was the use of chiplets or tiles. Instead of having a monolithic piece of Si as one CPU chip that has everything, now they have different chiplets purposely optimized for each function.
Basically the reason for backside power delivery is because transistors follow Moore's law still but wires never followed Moore's law! I learned this from Preparata in circuit complexity class in the 1980s ... If you shrink a wire you have to pump more current through it to overcome increased resistance and maintain latency! We had a one-time scaling of wires in the 1990s when we converted from aluminum interconnect to (more conductive) copper interconnect. And from your VLSI chip interconnect, it appears that wires have scaled a 2nd time in 3D. So this is perhaps the third and last time we can scale the wires!
Ma domanda : i prossimi processori Intel desktop di nuova generazione in uscita a fine 2024 , gli Arrow lake desktop che battaglieranno con gli AMD ryzen 9000 , useranno il pp Intel 4 o il piu avanzato Intel 20A ??? Come è possibile che Intel ci sta mettendo tanto per pareggiare il nodo produttivo con TSMC ?
Absolutely the best most informative video i've seen a very long time. I plan to share this video with all the keyboard jockeys who have jumped on the bandwagons lately with all the negative Intel chatter on the web it's near made me sick, This information is going to fill their minds with Knowledge the good kind. Thanks a million good job.
That part at 13:50 should be an obvious result, and I was thinking that as soon as I saw how the separation of data and power signals works on the die. If today's High-NA EUV lithography is compatible with all this and I don't know why it shouldn't be, I see this as getting us down below "2" or "20A" process nodes and easily down to "1.7" or "17A". And I'm quoting these numbers since they're advertising numbers and don't actually refer to any measurement. I think this puts us on the path to 500 million transistors/sq. mm AND should allow for transmission logic to start reducing in size again (transmitting signals off die which right now don't scale past about "6" for process nodes. As this matures it should get us to "14A" and this is the point where I start to question if the world needs higher density than that. Small devices won't, PCs won't, laptops won't. The main thing they need is power reduction and this looks like a big step towards that.
@@aapje I'll copy/paste the other comment I made because it gives more detail on this particular point. I agree we need more compute power in most devices than we have now. I don't necessarily believe we'll need much more than what we'll get to in about 10 years for mostly home devices, and what we'll need more is less power consumption. This is a personal opinion. I don't think we need a world where anyone can hold the power in their hand to hack any organization and bankrupt entities while they're stealing their money, from their smartphone. I just don't. That world sounds too dangerous. So, the rest of my thought: "Something I've said for the last 3 - 4 years, in paying attention to what transistor density is moving towards, I've been skeptical of these neverending videos about having to move on from silicon based ICs because we are near its limits. Well, while we might be approaching a point of diminishing returns for transistor density what I have said is the more important issue is going to be one of power consumption because really, approaching 500 million transistor in a sq. mm is a LOT. As in, you can have a powerful computer in your hands and that day is already coming as handheld game devices have shown. Humans will never need a handheld that can solve every secret of the universe. To be able to get the power of today's desktop computers which are pretty powerful into a handheld is about as good as it needs to get and we can solve that problem with silicon based circuits. The bigger issue is engineers needing to get better and better about parallel processes, not only in software but hardware so we don't need to clock a handheld device at 6GHz." And what I meant by parallel processing is not the same as what happens now in regards to hardware, which is many core CPUs. I mean what happens in a single core, where right now branch prediction does a good job of speeding up the time a thread is completed. This is more in line with what I mean, where a core gets better at parallelizing a string of instructions, because this is always a hangup, how fast a single thread can be completed. Right now we mostly rely on ever faster clock speeds, although Intel and AMD are getting better with their architectures. For software on the other hand I mean software engineers need to get better at parallel processing. Some companies are doing a good job, others aren't.
@@johndoh5182 I think that you have a completely unrealistic view of what even a 100x increase in compute power would do. For example, most hacking doesn't involve brute forcing and that which does, is done by governments on huge server farms with way more than 100x the power of a single server, let alone a single hand held. And they still can't just hack or bankrupt any organization at will. And single threading will most likely forever be the the issue, because if you add more parallelization and add more cores to match, you'll just get another single-core bottleneck when adding cores no longer helps.
@@aapje "And single threading will most likely forever be the the issue, because if you add more parallelization and add more cores to match, you'll just get another single-core bottleneck when adding cores no longer helps." Yeah you didn't read this part because that's what I said: And what I meant by parallel processing is not the same as what happens now in regards to hardware, which is many core CPUs. I mean what happens in a single core, where right now branch prediction does a good job of speeding up the time a thread is completed. This is more in line with what I mean, where a core gets better at parallelizing a string of instructions, because this is always a hangup, how fast a single thread can be completed. Right now we mostly rely on ever faster clock speeds, although Intel and AMD are getting better with their architectures.
@@aapje "I think that you have a completely unrealistic view of what even a 100x increase in compute power would do. " Cool, it wasn't really meant to be a factual statement. I don't think a smartphone needs to be 100X more performant than they already are. Once again, that's just me. On the other hand about 5X the power of current new gen products and a 50% power reduction sounds pretty great.
An important note: Intel's next-gen backside power delivery nodes can, at best, match TSMC's traditional data + power nodes in perf / W, density, and cost *according to Intel*. Even Intel knows 18A's PowerVia / backside power node won't beat TSMC's traditional nodes on all major targets.Thus, backside power delivery improvements are most accurate when compared intra-foundry ("Intel backside is better than Intel traditional", "TSMC backside is better than TSMC traditional"). It's only Intels' "2nd generation" Intel backside power delivery in Intel 14A where Intel thinks they can beat TSMC on perf / W and cost, and maybe density.
@@AstrogatorX The current rumors say some Arrow Lake-H chips (6+8 mobile). Lunar Lake will be N3B and for Arrow Lake-S, the CPU tile also seems to be N3B. Capacity booked by Intel before they knew if 20A would be ready in time.
Greetings from the Intel 20A BPD team! I was involved with Blue Sky Creek (BSC) in its early stages, but was moved from the team to work on 20A proper before they hit the labs post-silicon. I did my PhD on the development of modeling and optimization methods for chip to chip power delivery, focusing on multi-chip-module designs and finishing in 2020. Feel free to ask me anything about this new technology and I will happily share what I can.
My first question would be, how much did I get wrong? :D
@@HighYield I could nitpick stuff for sure, but you didn't miss much. One thing that would have maybe been good to mention as I see a lot of comments asking about it or maybe misinterpreting it is the thermal consequences of the new design.
With the transistor layer moved into the middle, it is true that you have to pull the heat from them through more silicon, but the reduced waste heat from other parts of the die such as the significantly lower resistance in the power vias more than makes up for it. I think we measured something like 20-30% lower resistance in some scenarios, which means a lot less voltage drop from contacts to transistors. That power loss from resistance is a significant portion of the heat generated by a modern chip. I can't say exactly how much, but ARL could actually use that to see a reduction in max power for the first time in a lot of Intel generations.
Hey, I'm just a person. But you did say you would answer questions so...
What does this mean for mobile phones?
I've got a couple of questions for you. I'm currently studying computer architecture so i might just be missing some common knowledge about manufacturing processes
1) If the pin connections are now on the backside along with the PD network, how do external data signals connect to the signal network?
2) is more material lost/wasted due to adding a new carrier wafer and removing the old one? (8:50)
@@parsnip908 Power and I/O are still routed to the same side eventually. This isn't reflected well in diagrams because you have to cut away somewhere. Most of what is on the frontside is the communication network on the chip, as this is what really dominates those low metal layers and gets in the way of other stuff. BPD should really be called BPD&I/O. You can still mount the chip in a BGA form factor like normal.
There is technically more waste material as you use 2 wafers in the production process, but since the yield can be quite high and the pitches in the metal layers relaxed, you can actually save money in the production process. This is part of why 18A will be offered externally after the technology proves itself on 20A.
A High Yield upload? Time to fill my brain with all sorts of semiconductor knowledge goodness
Small clarification regarding the M0, M1, M2, etc labeling. Using the BEOL image from wikipedia at 5:45, you identify and highlight "M0" as the lowest metal layer in the BEOL steps. In your image, M0 would actually be the tungsten metal, which is identified as part of the FEOL processes.
M0 is often called a "local interconnect", because it is a metal layer that is laid directly on the Si surface, and is used to connect immediately local transistors together. For example, in an Inverter gate with 1 NMOS and 1 PMOS, where the Drain of each transistor is connected to the same node, an Local Interconnect would be used to connect the Drains, by depositing a M0 layer directly on the Si, contacting the doped areas of the transistors that form the Drain. M0 is made of metals like Tungsten or Ti Nitrite, and not of the typical Copper or Aluminum metals used in higher layers.
So M0 is a "special" metal layer used for local interconnects, because it uses different material and has to be deposited and formed in different ways than the other metals, because it's in direct contact with the Silicon. It's also used as the "Contact" metal for higher metal layers that need to reach the Silicon. M0 is the only one that actually makes contact with the Silicon, being deposited directly onto the Sources and Drains of the transistors to form vertical contact structures. Then, when the wafers move to BEOL, the first "typical" metal layer, M1, is made of Copper and it goes down touches the top of the M0 layer.
In your BEOL image, you can see that the orange metal layers are labeled Cu1 (Metal 1), Cu2 (M2), up to Cu5 (M5). Each of these layers really comes in a pair of layers, because each Metal layer requires 1 layer for "Vias", which are the vertical structures that make contact between 2 metal layers, and 1 layer for the horizontal metal layers themselves. In the image, "Cu1" actually includes "Via1" (which is the vertical contact between the top of M0 and the bottom of "Copper 1") and "Copper 1" (which is the Metal 1 layer that makes the interconnects between blocks). Both Via1 and Copper1 each require their own photo masks and process steps. Then "Cu2" is actually "Via 2" (the vertical connection between the top of Copper1 and the bottom of Copper2) and "Copper 2", again each requiring their own masks and steps.
It's crazy we've gotten to the point where we can make this stuff at all.
Yeah... yeah it's crazy. It's a feat of engineering in mass production. It's to the point where technology reaching its limitations so they have to precisely improve everything as possible to get performance.
When I hold a chip when I'm building a computer or something, I get the chills. This is the closest thing to "alien technology" we have IMO. The fact that you can get such an artifact for some hundreds of dollars is insane. Not to mention the enormous use you get out of it.
and yet we are still at endless war over bullshit, with evil pigs leading stupid pigs...boggles the mind
In the '90s, My Granddaddy had the same awe & amazement as he told me about the very first radio he ever witnessed, before WWII. "It was a machine that stood in the floor, right in front of an open window. It had a wire attached to the back of it that went out the window to a pole. With nothing else attached to it..." 🤔😳
@@lilblackduc7312 it's AM radio receiver. yes you can power on these radio only with antenna, no power from outlet needed if the antenna wire is long enough. my electro teacher back in junior told us that the wire at least as same height as coconut tree to receive signal from AM radio in my local area.
I work on Global Foundries' 22FDX process node which is a 22nm FD-SOI process for RFIC layout design and I gotta admit, this would be ridiculously useful in ways I can't even describe for the RFIC or Analog industry. I imagine it'll get significantly more useful the smaller your process node since the resistances get real high real quick when your metal connections have to be incredibly thin and the vias are tiny.
If I could put my thick power delivery wires on the back and not have to share the area over a bank of devices with the data wires my life would be a hundred times easier and I could work at twice the speed. This could even allow some sort of automation for the power routing.
Everything would become significantly more power efficient and thermally ideal by doing this too.
I must admit, I was rather surprised to find there was only one bottom layer metal (M0). In my process node we have M1 and M2, before going to higher layer block routing metals. I guess it makes more sense for a larger process node and for RFIC design
I can't say a whole lot right now, but I will say that there is active development for the type of power routing tools you're talking about. At the very least there is talk of offering assisted power routing for some 18A customers.
I keep on wondering: wouldn't it make sense to try to back port some of these Backside Power Delivery approaches also to older / larger nodes, like the 28 nm & 22 nm nodes which are still largely preferred for economic designs, especially considering that BPD promises both cost & power advantages? (Of course assuming that customers of these nodes are willing to invest in new layout design methodology & update their cell libraries)
@FrankHarwald With legacy nodes, it's not as worthwhile as for less cost than spinning this up, they can jump to something like 14/12nm.
My head actually hurt from trying to comprehend the scale of complexity in designing this. Truly remarkable.
As an engineer in Kulicke & Soffa 24 years ago this content somehow educates me on the updates of the semicon industry. Thank you.
damn that was extremely insightful, info that I could never find on any other mainstream analysis channels
You are correct! I have been so happy I found his videos shortly after he started. I have always been impressed and he quickly became my favorite hardware information resource. I'm so happy to see him getting the appreciation he deserves! It's amazing how fast his audience has grown! As well as how he hasn't let it go to his head. You can tell he does it because he likes it. None of that arrogant "blah blah blah join this community (based around myself!!)" nonsense. No pumping out videos just to game the algorithm etc. Just good information, excellent insight, and straightforwardness!(not sure that's a word, but oh well). The only problem with him not kissing thw algorithms butt is that YT almost never notifies me about his videos even though I am subscribed and have all notifications on. Anyway, enough kissing his ___😊
Absolutely insane. I had to pause at some of the slides just to stare at the complexity.
Every time I say "that is surely it. They can't go further than that!" a new development blows my mind.
Bring on the sandwich era of chips!
very informative, thank you
Thanks a lot for the tip!
Amazing.
I also think having the metal layers for data and power separate could allow for easier "glueing" of smaller dies together to form a bigger chip as you can route the I/O on top without going to the package substrate or even solder bumps.
Glue the chips together, then etch the hermetic seal and build yet another bigger metal layer on top of the smaller glued dies to make the I/O path even shorter. Thus you can make a huge die but with smaller yields. That wasn't possible with flip chips, but now one could imagine that being possible.
Another fantastic video man👏🏻👏🏻You’re literally one of my top 3 favorite tech channels, great combination of in-depth knowledge and understandable delivery! Please never stop making these quality videos!👍🏻
Thank you! I remember watching a video from you about how to optimize speakers a few years ago :D
@@HighYield Haha small world, man!😀 Once again, highly appreciate what you’re doing here with these superb video!👏🏻
I do have to wonder about thermals and whether that will limit the benefits for P cores. The main heat generating component is the transistor layer and this method puts the signal layers in between that layer and the cooling system.
People used to polish down their CPUs just to reduce this distance by several nanometres and improve performance so I don't think it would be insignificant
Everything you said makes complete sense from physical point of view, but they did claimed 6% frequency increase so lowered power losses combined with better signal quality must completely offset higher thermal resistance. After all all that wasted power in classic design was ending as heat and not optimal signal network meant they had to use more power to hit frequency target.
@@sznikersyeah what this guy said.
@@sznikers Pretty much exactly this. Cleaner signals means you don't spend as much power cranking your transmit and receive points, and you lose less to resistance everywhere since the paths can be direct and physically thicker.
Current intel method is to push the cores highest you can do, with BOD&IO , you have to find a good balance b/w how much power you can push vs what perf target you want.
It an added constraint to manage.
Lapping (not polishing) CPUs & heatsinks was to improve surface contact, not reduce thickness.
I would like to know does backside power delivery allows SRAM start to scale with the process node again? It would be interesting
Very interesting question.
Don’t forget that lower thermal resistance is intrinsic to this as well
As well as increases power capacitance very near to the transistors, allowing lower noise. Larger surfaces of power planes are free massive capacitors. Nvidia used available space in the metal layers to make capacitors to reduce power noise.
Add to that, keeping (now lower) power noise away from the signals will help too.
@@HansSchulze and less parasitic capacitance to signal lines
@@snaplash I/O yes, but that’s nothing compared to the interconnects which are front side
Lifelong learner, first time down this particular rabbit hole. Comparatively, this video worked for me. Nicely highlighted the current design limitations theoretically and advantages of new deslgn. First lightbulb was which is frontside vs backside
Wow! Seems amazing, thank you for diving deep into how technology works, there are not many resources dedicated to that
You didn't comment on heat dissipation, as there will be more layers on the bottom of the trasitons it could become more difficult or even more heat, I hope more videos like this, thanks for the content!
Yeah theres been alot of discussion on this but still not clear, I think it might just put 300w consumer CPUs in the pass
@@ItsAkile Yes but It still being a go for notebooks and SI OEM. If have to bet I would say it is for E-Cores only chips
Intel said there isn't much of a difference and the increase in efficiency (less energy lost to resistance = less heat) makes more than up for any decrease in thermal capabilities.
@@pedro.alcatranah, that was just for testing. It’s a full feature to be widely implemented
I just saw someone adding cooling to both sides of a motherboard which didn't make much sense to me at the time. And the POC didn't work very well. I would think that the power delivery side would need the most heat dissipation. This seems super promising.
super technical video, presented in an excellent manor which was dense with actual information...
too often, people make videos where they TALK a lot of words, but, dont really SAY anything !
great video
Super curious about the primary heat path in this configuration. It seems like it would have to be down into the power layers and ultimately into the substrate and PCB.
One of clearest explanations of a complicated subject I’ve ever seen on RUclips. Thank you!
thanks for the explanation. It was really clear and easy to grasp 👍
The Chipmachine factory ASML have been/are redesigning some parts that I/we make as their supplier. one of the assys was pretty much flipped upside down.
but the changes are only expected to hit production in a year or two. Fun to see this kind of tech developments reflect on things that I see happening at work. Because as a supplier of parts you never get the full picture of what they're actually doing.
Hey good to see you again buddy!
another well paced, well-explained semiconductor. Amazing!
Since this method works both sides of a die, could it be used to further develop the transistor layer before making the second metal layer?
Yes, I believe Intel is looking at making stacked "CFET" devices, which effectively form a vertical NMOS/PMOS pair that can be connected on both sides.
What about cooling? Having transistor side closer to heatsinks significantly improves heat transfer I would imagine. Considered they are pumping hundreds of Watts into their chips, it sounds like it would actually decrease performance as for e.g. "turbo boost" wouldn't be able to clock as high.
The gains from the lower internal resistance pretty much completely offset this. Blue Sky Creek saw a 5-10% clock speed increase over production-volume Intel4 node chips. For an example, if Meteor Lake were using this test node, you would see boost clocks about 400mhz higher on the P-cores and 250mhz higher on the E-cores, which doesn't sound like much, but it is done at the same power draw.
@@DigitalJedi “Back in my days” ™ 400 Mhz was the whole processor speed. If you were lucky, and rich.
Great video :)
I was wondering, why don't they create the signal layers first, then the transistors and then the power delivery? That way you would avoid grinding down to the transistors and adding structural support again.
that would need an entire redesign on how the silicon litography part works I think, wich would lead to many new costs for validation and R&D necessary
I think the transistors themselves need the silicon wafer, they can deposit a metal layer on top of silicon but not a silicon layer on top of metal, my best guess, not an expert
Because the metal layers have to connect to the silicon layer and building up silicon on top of metal seems like a really difficult process.
@@eggnogg8086 @jorenboulanger4347 - that's right. The silicon substrate (the wafer itself) is the magic that makes semiconductors work. Transistors have to be built directly into the silicon itself. Then, metal layers are deposited on top of the silicon.
So the FEOL processes that @highyield talked about embed the transistors directly into the silicon (and with 3D transistors like FinFETs and nanosheets, there is some deposition above the Si as well, but the point remains, the transistors are build "in" the Silicon). Then, we need to use metal to connect the transistors together. This is the BEOL processes, which start building layers of material on top of the silicon surface, including metal and oxides and insulators, etc. So, essentially, the FEOL processes are taking a bare silicon wafer, which is a nearly perfectly pure, crystalline substrate, is required to build the transistors, then metal layers are added on top, which can be deposited using other processes.
The whole process starts with a blank Si wafer, so the transistors have to be built on it. You can't, for example, built a bunch of metal layers first on top of the wafer, then build the transistors on top of those metals, then add more metal on top of the transistors, because the transistors have to be built directly in/on the Silicon wafer. So if you're starting with a Si wafer, you have to build the transistors first, then add metal layers on top. Then flip the whole thing over, grind the backside of the wafer down so the bottom of the embedded transistors are nearly exposed, then build up another set of metal layers, creating a metal/silicon/metal sandwich
Very clear explanations, great delivery.
must watch channels, especially tech discussions with precise details, epic 🎉
Another banger. Thank you, Mr High Yield
Thank you for such an interesting video. A question, do you know if power via will help to scale i/o so analog parts of Chips further down again, as new manufacturing nodes only scale compute further down and analog and sram are lacking there. Is power via helping here in better density again?❤
Great Video! Always a joy to see one of your videos
This is the best, most clearly explained video on the topic of BSPD that I've seen so far. Even the manufacturers themselves do a worse job of explaining it. Good stuff.
Why is a BSPD not flipped at the end?
It gets flipped to build the backside on top of the wafer and should then be flipped again to connect it to the substrate or what am I mistaking? (Timestamp: 11:05)
Duplex is in regards to the frontside (= the top of the wafer) facing downward. With BSPD the backside is down and the frontside is up. So it’s technically not in a flipped position when finished. It’s not about if you flip the chip during manufacturing. Maybe I could have explained it better.
@@HighYield Thanks for the explainer 👍 so flip chip is basically just another phrase for backside up?
Great article, question on the backside IO routing, obviously the IO's need to connect in some shape or form to the Top Side signal routing, How is this performed, you would have to route from backside through the silicon and out to one of the top side metals? I can't visualize this and at what top side metal layer does the IO connect to? Is there one large via through from back to front?
One of the better illustration and explanation. Thank you
always so enjoyable to watch! even if i don't understand 90% of what's happening lol.
I understand at lest 70%!!
Could you explain how the metal layers interconnectors are built? The ones that go above and below the transistors?
That be a interesting topic for a future video.
Still don't know how they connect them to pcb front and back both
The frontside doesn’t need to be connected to the PCB, as it only handles chip internal communication. Power supply and I/O is routed through the backside, which is the side connected to the PCB.
To think when I did chip routing software, a 3 metal layer process was considered advanced.
It's getting indistinguishable from magic now 😉
Excellent video. I always learn so much from your videos.
such a good vid. i've learnt a lot :) thanks mr high yield
Excellent video, I work in semiconductor section and this is great info for me as well.
Why no video about the qualcomm nuvia chip?
स्पष्ट अवलोकन , प्रज्ञावान विश्लेषण और बहुत सुंदर समीक्षा कुलमिलाकर अर्धचालक यंत्र के विद्युत और संकेत वितरण प्रणाली और तकनीक के विषय पर बहुत सुंदर प्रस्तुती। 👏👏👏👌👌👌
Finally BSI content that is digestible and can be understood ^^'
Thanks!!!
Glad it was helpful!
My brain parsed that as Business Service Integration. I spent too much time with corporate again, methinks.
Does it have a usage in complex chip designs or just small low power chips? How will You use active cooling on chips where You have power delivery with higher voltage under the chip and extra less thermal conductive layer of wafer on top?
Double sided circuits on the chip is even more efficient and vias to connect power and signals between both sides. Next is stacked chi been done?
Awesome Video, thanks!
I'm very happy to see this video. Slight bit of criticism - give yourself time to speak slower sometimes. I'm around 7:56 and it is hard to understand you, it almost feels like you're trying to speak as fast as possible. I don't think it applies to whole video, but it does to some segments, so in the end if the video was 20 seconds longer it wouldn't hurt that much.
But your content is top notch - and indeed problem of power delivery and data transfer, wiring etc. is a problem that is constantly evolving. Meaning that even if you find a solution, often people will take it for granted and within couple of years you will have to find other solutions to very similar problems. And yes, half of the evolution of microchips might be smaller nodes, but the other half is how to get them powered and linked. Very underappreciated topic, so I'm glad you're covering it! Good Job! 👍
I played the video at double speed and had no problem understanding High Yield. I think that he articulates very well for a non-native speaker. Just my 2 cents.
@@aapje Agreed. It was only around the time of timestamp - first few sentences in chapter "Backside Power Delivery manufacturing" that I had trouble with. Though I'm not native speaker myself - and I wouldn't be able to follow half of RUclips at "2x speed".
What about the large dissipation of heat generated on the component. and how to reduce the heat generated and the coupling it requires.
People working on these things are heroes. The significance of chip-making cannot be understated.
This was fantastic! Thanks!
Woohoo! Yay! Missed ya man! Thanks!
I had the script sitting around for weeks, but never got to filming :/
do you have a link to the article that the logic/power separation at 7:17 comes from?
Here you go: ig.ft.com/microchips/
good tutorial very well organized and presented for comprehension. mb
On the old rules Meteor Lake cost at risk, ramp, peak, run down per unit cost will progressively move down from $227 per unit cost at risk production moving down through peak and just past peak to $88 marginal cost per unit at the 36th millionth unit of first gen disaggregate SIP production. The question is where is learning right now on realizing those marginal cost reduction objectives. mb
One of the rarest and rewarding watchtime in the entire year, congratulations 🎉
To clearify, this will not affect how these get connected to boards. Correct? It will however mean growing a layer of silicon on the top ofnone of those metal layers.
I wonder how that will work out. Im Only used ised to starting with one wafer then building atop .
1:40 what if you do like a PCB and make double sided chips, or 3 layer silicon (or silicon substitute) with power delivery provided from said middle layer
Now take amd Ryzen, instead of just using the top surface, you could put another full Ryzen on the backside, and use a sort of middle interposer for power delivery, and use some custom logic to simply run it like a multi core CPU, with hyper threading
Now you realistically can get 64 core CPU or higher
Same with a gpu
In effect you split die thickness in half, with a center interposer that provide syncing and glue logic, and can still achieve backside power delivery for both physical units
It'd be a weird looking package would not be compatible with current sockets due to cooling system changes, in this context you'd probably would want a heatsink dissipation capacity on its own of 300w and higher with liquid cooling
The chip package probably would resemble a plcc style chip with the pin or contacts around the edges and towards the sides
Because both sides are used, that would allow you to mount the chip on a riser card to facilitate cooling on both sides, but increase in space and overhead
Call it Full silicon fabrication
Already effectively 3d print certain chips like dram anyway to get storage density so making layered silicon PCB shouldn't be a wild idea
Because that's all a chip is, a PCB
On a PCB, connected to other PCB
Each at a different scale
You just shrink the PCB down so you only can see it under a sem or tem photography
I wonder what about heating and cooling when silicon is in middle layers?
Appreciate the video, how does this approach translates to clock speeds? If the power network is longer does that mean lower clock speeds? Also if it has so many advantages and even cheaper to manufacture why this hasn't been done before? 17:30 in this frame you can clearly see Mitsui, I had no idea they made semiconductors as well. Are Japanese also at the cutting edge of semiconductor manufacturing?
The reduction in voltage drop leads to a cleaner and more stable supply of voltage to the cores, which results in higher clock speeds. Just like (on a different level) a better and more stable PSU can also help stabilize a overclock. It hasn't been done before, because up until recently you didn't really need it and frontside power was good enough. Only over the recent years the metal layers have become so complex, that it made looking for a solution a viable option.
As for Mitsui, I think we are looking at a photomask or a cover for one. It's a mask for Intels Meteor Lake.
Doggy Style vs Missionary Power
?😂
Awesome, thx! 😍
Given the significant benefits of backside power delivery outlined in the video, how might this technology influence the future landscape of consumer electronics, especially in terms of device miniaturization and energy consumption?
I don't think it'll be long before that second silicone layer starts playing host to a second layer of transistors.
And then we can start playing with architectures that have top layer "master" cores, in setups with bottom layer sets of 2,4, 8, 16 small "slave" cores.
Since the first wafer is removed it could be thinner to begin with, saving wafer cost and time to grind it away. Also the final structural wafer wouldn't need to be suitable for transistor production, allowing cheaper materials (failed wafer recycling?) or even something with better thermal transfer.
Manufacturing that depends upon silicon support will have to change.
Great video. I'd love it if you a made a another one about Samsung's backside power delivery mechanism.
Samsung's isn't far off from PowerVia, but it would be an interesting video to compare the 2 once both are on the market. Hopefully we could get a comparison of 2 actual dies and see what makes them tick.
@@DigitalJedi IIRC it'll debut with SF2 next year.
Seems it would help with cooling too, if the transistor layer is closer to the heatsink contact surface
Great video thank you.
All the other videos are focusing about the physical limitation of transistor size but never even touch the problems of the metallic layers. Nice one.
Very good explanation
I assume that in MCM chips this advantage of backside power delivery becomes less useful?
It is solving different problems. MCM is a work around on transistor density while backside power delivery is a work around on metal layer density.
I actually did my PhD on pretty much exactly this. MCM and BPD solve completely different problems, and in fact you can use BPD to boost an MCM design. So it's quite the opposite really. The 2 technologies work very well together.
MCM allows you to scale beyond the reticle limit without going to complex methods like pattern stitching, which is how Cerebras makes their wafer-scale chips. You see this pretty well with Nvidia's B100 and AMD's Mi300 series chips, which this channel has a fantastic video on. It can also be used to scale beyond a practical limit with yields or cost, which we see with AMD's Ryzen and Navi31 and 32 dies using older, more mature nodes for I/O functionality, or with Intel's Meteor Lake optimizing different dies for just one specific task.
BPD works within the die itself to free up the internal signalling and external power and I/O networks from each other, giving more space to optimize both and lower internal resistance with thicker wiring for lack of a better term. This helps your MCM design as your chips have to communicate somehow, and that freed space on the power & I/O side means better links between dies.
You can mount a bunch of BPD chips on an interposer to make an MCM design just like traditional dies. Intel has that in the pipeline already for some 18A-based server chips. You can even apply the same principals to that interposer if you want, using multiple layers for die-to-die vs external communication and package-level power.
Such good content!
Thank you for this video. I now understand a bit of what all the fuss is about, and I think the fuss is fully justified.
Im curious where you got the image you used for the thumbnail? Was this an AI generated image, or did you find it somewhere? Could you share a higher res version?
It's AI generated and then I edited it with Photoshop. I can upload a high-res version, but it's not a real prodcut or anything like that.
@@HighYield I really like the look of it as a stylistic reference. If your could share that would be great. It's just hard to make it the details at such a small size.
@@Eikonic_ Here you go: imgur.com/K23zDl0 (that's the original AI generated image)
@@HighYield very nice! which AI did you use? i gave a quick try with Adobe Firefly and it looked like crap. This is super clean and detailed. Thanks for sharing!
This channel is just 🤩
Why not use both on both side? I guess because we want to sperate the higher voltage power from the more sensitive signal circuits?
You don't need to make that many connections for power delivery as for signal, so much less "power" wires go around to the other side. And having signal and power separated of course decreases interference between them.
I wonder how cooling would change with this tech. If there are contacts on both sides of the CPU, how would cooling be supplied?
There are only external contacts on the backside. The frontside is now entirely dedicated to internal routing. It should be called BPD+IO. Your current chips are face-down, these will be face-up, so technically no longer flip-chip.
Thanks for the explanations, I really love this kind of videos
It so easy in words, but... how to detach the nanometer-scale thin layer of ready made topology from the base crystal (remember - this is SINGLE crystal) and flip it over???
Great explanation. Thanks for doing this.
Dude, gorgeous graphics at 10:08.
I wonder how this affects heat.
power efficiency allowing for higher clocks with less voltage means less heat but also mean for more overclocking room for higher cooling solutions so overall more it scales to be the same as todays you get the best speed you can for the temperature you can maintain. what it really means is devices are gonna get even faster and efficient.
one variable i would like to know is what part of the die generates more heat than the others but am assuming it would be the power delivery and i/o but if this design improves those power lanes I think that might improve the heat waste generation am not sure of those detail. but would be nice to know.
I am not a qualified semiconductor expert, but assuming that the total die thickness stays the same, I would think that backside power delivery would improve temps, and not only because of the improved power efficiency. The transistors have been moved slightly closer to the IHS, and they have the copper signal wires between the transistors and the IHS conducting the heat towards the IHS. on top of that, the larger power wires may also improve temperatures due to their increased thermal mass.
Maybe this way will make it more clear: there are 3 types of metal connections, transistor-to-transistor, power-for-transistor, and transistor-to-IO. Since transistors are on one plane, so marjority of the metal layer connections are transistor-to-transistor. What back-side power delivery do is, put all transistor-to-transistor metal layer to the front side, put power delivery layer and IO metal layer to the backside. Then front side will facing your heatsink, backside will connect to micro bump eventually goes to your motherboard.
Man, you rock!🤘Awesome content as always
What stopped them doing backside power delivery until now? Seems like an obvious thing to do if you're tight on space on just one side.
Complexity and costs skyrocket with EUV nodes and the even smaller feature sizes needed novel metals solution that was becoming too hard to achieve a balance for data and power signals.
Great summary and explanation. In addition to backside power delivery, I think we have already seen another thing that helped with segregation of data/memory and power. That was the use of chiplets or tiles. Instead of having a monolithic piece of Si as one CPU chip that has everything, now they have different chiplets purposely optimized for each function.
Amazing breakdown!
Very nicely explained and contains in-depth details. Are you available on Twitter/ LinkedIn? I want to follow you.
twitter.com/highyieldYT 👋
Thanks. Followed you. Are you from Bay Area?
Basically the reason for backside power delivery is because transistors follow Moore's law still but wires never followed Moore's law! I learned this from Preparata in circuit complexity class in the 1980s ... If you shrink a wire you have to pump more current through it to overcome increased resistance and maintain latency! We had a one-time scaling of wires in the 1990s when we converted from aluminum interconnect to (more conductive) copper interconnect. And from your VLSI chip interconnect, it appears that wires have scaled a 2nd time in 3D. So this is perhaps the third and last time we can scale the wires!
Ma domanda : i prossimi processori Intel desktop di nuova generazione in uscita a fine 2024 , gli Arrow lake desktop che battaglieranno con gli AMD ryzen 9000 , useranno il pp
Intel 4 o il piu avanzato Intel 20A ???
Come è possibile che Intel ci sta mettendo tanto per pareggiare il nodo produttivo con TSMC ?
Some Arrow Lake products will use 20A, but most will be on TSMC N3B.
Absolutely the best most informative video i've seen a very long time. I plan to share this video with all the keyboard jockeys who have jumped on the bandwagons lately with all the negative Intel chatter on the web it's near made me sick, This information is going to fill their minds with Knowledge the good kind. Thanks a million good job.
I consider myself a chip enthusiast (I regularly use and program computing devices). This was very interesting.
Excellent work!
That part at 13:50 should be an obvious result, and I was thinking that as soon as I saw how the separation of data and power signals works on the die. If today's High-NA EUV lithography is compatible with all this and I don't know why it shouldn't be, I see this as getting us down below "2" or "20A" process nodes and easily down to "1.7" or "17A". And I'm quoting these numbers since they're advertising numbers and don't actually refer to any measurement.
I think this puts us on the path to 500 million transistors/sq. mm AND should allow for transmission logic to start reducing in size again (transmitting signals off die which right now don't scale past about "6" for process nodes.
As this matures it should get us to "14A" and this is the point where I start to question if the world needs higher density than that. Small devices won't, PCs won't, laptops won't. The main thing they need is power reduction and this looks like a big step towards that.
We can always use more computing power.
@@aapje I'll copy/paste the other comment I made because it gives more detail on this particular point. I agree we need more compute power in most devices than we have now. I don't necessarily believe we'll need much more than what we'll get to in about 10 years for mostly home devices, and what we'll need more is less power consumption. This is a personal opinion. I don't think we need a world where anyone can hold the power in their hand to hack any organization and bankrupt entities while they're stealing their money, from their smartphone. I just don't. That world sounds too dangerous. So, the rest of my thought:
"Something I've said for the last 3 - 4 years, in paying attention to what transistor density is moving towards, I've been skeptical of these neverending videos about having to move on from silicon based ICs because we are near its limits.
Well, while we might be approaching a point of diminishing returns for transistor density what I have said is the more important issue is going to be one of power consumption because really, approaching 500 million transistor in a sq. mm is a LOT. As in, you can have a powerful computer in your hands and that day is already coming as handheld game devices have shown.
Humans will never need a handheld that can solve every secret of the universe. To be able to get the power of today's desktop computers which are pretty powerful into a handheld is about as good as it needs to get and we can solve that problem with silicon based circuits.
The bigger issue is engineers needing to get better and better about parallel processes, not only in software but hardware so we don't need to clock a handheld device at 6GHz."
And what I meant by parallel processing is not the same as what happens now in regards to hardware, which is many core CPUs. I mean what happens in a single core, where right now branch prediction does a good job of speeding up the time a thread is completed. This is more in line with what I mean, where a core gets better at parallelizing a string of instructions, because this is always a hangup, how fast a single thread can be completed. Right now we mostly rely on ever faster clock speeds, although Intel and AMD are getting better with their architectures.
For software on the other hand I mean software engineers need to get better at parallel processing. Some companies are doing a good job, others aren't.
@@johndoh5182 I think that you have a completely unrealistic view of what even a 100x increase in compute power would do.
For example, most hacking doesn't involve brute forcing and that which does, is done by governments on huge server farms with way more than 100x the power of a single server, let alone a single hand held. And they still can't just hack or bankrupt any organization at will.
And single threading will most likely forever be the the issue, because if you add more parallelization and add more cores to match, you'll just get another single-core bottleneck when adding cores no longer helps.
@@aapje "And single threading will most likely forever be the the issue, because if you add more parallelization and add more cores to match, you'll just get another single-core bottleneck when adding cores no longer helps."
Yeah you didn't read this part because that's what I said:
And what I meant by parallel processing is not the same as what happens now in regards to hardware, which is many core CPUs. I mean what happens in a single core, where right now branch prediction does a good job of speeding up the time a thread is completed. This is more in line with what I mean, where a core gets better at parallelizing a string of instructions, because this is always a hangup, how fast a single thread can be completed. Right now we mostly rely on ever faster clock speeds, although Intel and AMD are getting better with their architectures.
@@aapje "I think that you have a completely unrealistic view of what even a 100x increase in compute power would do. "
Cool, it wasn't really meant to be a factual statement. I don't think a smartphone needs to be 100X more performant than they already are. Once again, that's just me. On the other hand about 5X the power of current new gen products and a 50% power reduction sounds pretty great.
New favourite channel
An important note: Intel's next-gen backside power delivery nodes can, at best, match TSMC's traditional data + power nodes in perf / W, density, and cost *according to Intel*. Even Intel knows 18A's PowerVia / backside power node won't beat TSMC's traditional nodes on all major targets.Thus, backside power delivery improvements are most accurate when compared intra-foundry ("Intel backside is better than Intel traditional", "TSMC backside is better than TSMC traditional").
It's only Intels' "2nd generation" Intel backside power delivery in Intel 14A where Intel thinks they can beat TSMC on perf / W and cost, and maybe density.
So what's on 20A?
@@AstrogatorX The current rumors say some Arrow Lake-H chips (6+8 mobile). Lunar Lake will be N3B and for Arrow Lake-S, the CPU tile also seems to be N3B. Capacity booked by Intel before they knew if 20A would be ready in time.
@@HighYield Thanks. Keep up the good work!
Great video
Can you comment on the idea of quantum tunneling? I dont understand how thats benificial.
Saying "Design Flaw" is grossly incorrect, as everything in engineering is a compromise, not a flaw.
I mean it's a compromise and a flaw at the same time. But you have a point. Still, I need to create some interest in the content of the video ;)
I think it was a joke 😅