@@HighYield i want to get an 8 core 9700X / and a 12 or 16Core 9900X/9950X but also both an 8core amd 16core X3D (especially if they have both v cache CCDS (or the 3D Cache/Inf fabric Link cache Bridge Die to not only have more L3 on one CCD the other has access as well through the substraight and ccd >< CCD (Bidirectional ) 😑🤤 think of the 3D V-cache (even if only on one CCD) has think of it as 4 channels rather than 2ch // if that makes sence ??? (cached data has silicon to silicon (aka 3D cache chiplet to CCD Communication but also communication over the Inf fabric in the substraight, but also the CCDs Inf fabric links on top of the CCD as well AND from CCD 2 CCD also ? hence my 3DCache Inf fabric bridge chiplet ? 🤤
Yeah, nun verstehe ich AMD, weshalb es den TDP Wert vo einigen Wochen so niedrig getrimmt hatte. In deinem Video hast du nur kurz über Temperatur geredet und vermutlich hatte dadurch AMD Angst bei höherer Taktung auf den CCD zu viel abwärme produziert wird und somit zum Totalausfall wird. Nach einpaar Wochen hatte sich die Angst via AMD als unbegründet herausgestellt. Ansonsten ein super Video und auch super erklärt. P.S. English is good - not very good. Hier fehlt ein wenig der Slang. 🙂
I hope you get invited to other channels that use this video in their content! LTT is probably too basic, but techtech potato, level one techs and some of the more tech heavy ones😊
@@Fin1nishingMoveConsumer tasks won't see Zen 5 gains. Zen5 gains more on Linux and more specialized tasks. Zen5 also gains efficiency compared to previous Zen4 release.
Double stacked 3D cache on both ccd's for 12c/24th and 16c/32th? I've also pondered Zen 5c for either mixed Z5+Z5c or all Z5c for up to 24/48. Idk if 5c cores can clock as high. But, my laymen guess is 5c's lower cache per core wouldn't be an issue with the double stacked 3D cache. I've also thought about chips like the 8700F getting 3D cache. Again, this is just me thinking aloud. But, I could see those defective APU's being repurposed into budget 3D options that would perform between standard 7700X/9700X and 7800X3D/9800X3D. Or, would the halved 16mb L3 on the die itself be too little?
This is kind of a part 2 to my first comment... What I'm getting at is how important is having a regular pool of L3 cache on the die directly attached to the cores versus just having the large 3D cache to compensate? I've heard people say that packing smaller (but equal ipc) Zen 5c cores in wouldn't hurt performance because of that large pool of 3D cache. That way you could have hybrid 20c/40th, or a 24c/48th all Zen 5c chip. Like I stated before, this is just me as a laymen asking if this would work. Idk if the clocks would have to be significantly lower on the 5c cores either. I do know that AMD has used hybrid Zen 4/4c in Phoenix APU's. And, when all cores ran at a fixed clock performance was the same.
Having 2-layer 3D-cache makes sense given the rumors that the new X3D-CPU's can boost higher. More compact 3D-cache that is further away from the compute cores would be less sensitive to heat coming from the compute cores.
As the SRAM cache is now probably stacked now on multiple layers. Wondering if it suffers also from exes heat. Question is how much can you stack before heat dissipation starts to become issue and making it more unstable etc... Even SRAM is more efficient and generates less heat.
Wasn't the main issue with the clockspeeds the lower voltage tolerance of v-cache, not the heat? The heat issue is solved with lower power limit and lowered thermal limit, but it should have still been able to clock high for as long as it had the thermal headroom instead of having locked core voltage
@@shepardpolskaSaying new X3D chips could "clock higher" really just means in relation to the non-X3D configuration. Since zen5 effectively has the same freq limits as zen4 while generally being more efficient, it's very likely X3D will get closer to this because it'll have more headroom. Especially since X3D uses binned chips capable of running lower voltages.
The 35% reduction in L3$ is quite an amazing feat. Also, with the TSVs, this is another "generational" change, and I'm impressed with how far R&D goes to improve it unlike the IOD.
@@NootNoot. AMD would probably prefer staying with the same I/O die for an entire memory generation, but I think we will see at least minor a revision, as the current one caps out at 6000MT/s.
OK, you have managed to engage my interest for Zen 5. This is the kind of analysis I have been missing since Jim from @AdoredTV stopped making tech content. That and his "Awright, guys, how's it goin'?".
Gosh those super tight-in shots near the TSVs just blew my mind. Reminds me of just how exquisitely complex modern processors are, and of course it's a tour de force in macro photography.
AMD like had extra SI vias for the previous generations to increase yields. Now that the process has matured, they don’t need them. The extra vias could have caused the power problems of the previous gens that restricted over clocking.
The fact AMD has found issues all over the place, and RUclipsrs like HBU have as well indicates that something on the consumer PC end isn't capable of pushing Zen 5 to their potential.
It took me a whole undergrad just to understand some terminologies present in video. "Just Understand". Semiconductor is quite interesting. Stuff like these make me realize there is so much more to explore. I hope more people find such stuff interesting. Hey RUclips algorithm push this video. I hope "this" helps.
The TSVs pointing to a smaller V-cache die is really interesting if you take into account Semi Analysis' deepdive on Zen 4c. In the article they do mention that part of the area savings are from moving to an (early production at the time) 6 transistor cache design rather than the traditional 8 transistor design commonly used in most CPUs currently. Without control structors, with the new smaller TSVs, with the reduction in dead silicon, and using the Zen 4c tested 6 transistor cache cell, I could see them managing to just shrink it enough to fit on the new L3 + L2 area indicates by the new TSV structure. Regardless, either this way, or the double stack proposed in the video will be interesting to analyze. Can't wait.
I really do appreciate the invaluable work that goes into these videos. The only other people who go above and beyond for such a niche subject matter are Anandtech, who are now defunct, and Chips & Cheese. Thank you for your effort.
Excellent explanation. I want to see how X3D cache will be implemented on Zen 5 since some reports tell that the 9950X3D and 9900X3D will bring something new.
AMD kept saying it was a redesign. Your video highlights this. It seems like most people can't see past minimal IPC uplift and the introduction of full avx512. Excited to see what 3Dvcache brings for this gen.
Yup. SO many comments about "AMD is getting lazy now" and what not, as if Zen 4 to Zen 5 was just Intel 13th gen to 14th gen. It can't just be that someone screwed up somewhere in this or that they just didn't get the results that they were expecting, it's always gotta be some stupid conspiracy about the chip manufacturers being Scrooge McDuck, doing everything they can not to have to make better chips. It's incredibly frustrating to try to have meaningful discussions in the tech community. These modern high performance processors are easily the most advanced pieces of bleeding edge tech that we have in our daily lives - and even enthusiasts take it for granted.
Most people don't understand that a major architecture redesign doesn't always improve performance significantly at first, there can even be performance regression in some aspects. But updating incrementally without a major rebuild eventually gets diminishing returns. So throwing everything in the air and rebuilding with a new floorplan allows large structures to change, this also increases the headroom for improvements later. Making a large structure change on an existing floorplan means working around existing structures and wasting transistors to work around stuff, squandering the transistors a new process node brought. Eventually things get complicated and your "low hanging fruit" updates are gone, like beefing up some execution units to increase FP or INT performance. Eventually you need a whole lot of stuff to change at once or you suffer bottleneck whack a mole. You need a more complex branch predictor to handle more throughout, beef up all the low level caches to handle that, completely redo the execution side to support 512 bit wide paths. Everything gets built bigger essentially. Zen 5 is fresh and should have lots of headroom to improve with future process nodes. There's lots of easy low hanging fruit that should bring easy large gains by simply beefing up a few components.
@@HighYield Haha, c'mon, does a 'fan' get exclusive beautiful high-res die shots in private to make for a video? /s Anyways, great work! Although idk if it would make for a good video, and the question of attainability, I'd love to see Turin Dense die analysis. Zen5c has changes to it's CCX arrangement as well sharing L3$ unlike it being split like Zen2. And oh boy, are those CCDs looong, which kinda of makes Zen5c on desktop unlikely.
@@NootNoot. There are still a few project in work, for example a Nvidia GV100. I'd love more deep dives, but Fritzchens does whatever he wants to (and rightfully so).
Amazing, like looking into the furthest depths of the universe. Beautifully presented and explained. Nice balance between highly technical and overview. More please!
Bro, the images are bonkers. They look so good, even after zooming in that much. Thanks for the explanation of the chips, it makes the images even more amazing.
Great presentation! Technical enough to be widely understood. (I hope.) You should be on AMD’s marketing team. If the rumors about VCache overclocking are true, it makes sense that the VCache would be placed above the L3 cache, allowing for better cooling. It’s unfortunate this architecture doesn’t perform better in gaming. If the boost was closer to 8-10% instead of just 3-5%, I think gamers would have a much better opinion, and more people satisfied.
It's a fascinating question though: If clock-speeds are higher than with Zen4X3D, does that mean that the performance-improvement from X3D will actually be HIGHER? Since the Core-die doesn't have to clock down as much, that could mean that some of the negatives of X3D could be mitigated, and X3D-CPU's will henceforth be better for production-work and work which favours frequency over cache.
Incredible video, it's super interesting to see the details and thoughts and considerations that went in these chip designs, it's just fascinating to see...
I was wondering if there'll be less 3D cache per CCD, but AMD will make up for it by populating both CCDs, rather than just one like in the current 3D v-cache.
The problem with marketing BS are for games. even with seeing review and the wider bunch of things and interview with Mike Clark(chief arch of zen), it was definitely designed for Core workload. It true AMD marketing teams are kind of stupid, but you still need something to show. and it makes me wonder if there a spy from nvidia pulling strings with stupid marketing (RDNA3 and delayed RDNA4) on GPU side
@@noobgamer4709 Seems to be a combination of Windows 11 being atrocious and AMD exaggerating their gains rather than just AMD being dumb. The differences between Windows 10 and Windows 11 in performance are massive especially on Zen 5, check out Tech Yes City's recent vids on it.
The AMD has said that 3D cache version can clock closer to non cache version. So while 7000 and 9000 non cache versions have allmost same clockspeed… the 3D cache version 9000 version should clock higher than 7000 series does. And that is the main advantage this time.
Makes sense for binning purposes why there's no 800X non-3D CPU's anymore, especially for Zen 5 if the X3D can clock closer to non-3D. I suspect a 9700X3D will eventually surface that performs on par with the 7800X3D due to lower quality binning. AMD's naming convention gives them a lot of wiggle room for future Zen, with non-X, X, and X3D variants to fill in gaps if needed, or remove where redundant. I've been on the fence with Zen 5 X3D but these kind of deep dives give me hope that it'll finally be worth moving to AM5.
AMD has long had an option in the AGESA called "X3D", for which the options are "Auto, Disable, 1 stack, 2 stacks, 4 stacks" so I would not be surprised by a 2-hi stack
INTERESTING...! Could you give a reference for this claim? Would be very interesting if you could show this info' to High Yield, and have him determine if there's any correlation to potential multi-stacking.
@@predabot__6778 It is found under AMD CBS/Zen Common Options/X3D. Some motherboards expose this menu, but not all. ASROCK usually has these exposed, but this is not really a menu made by the motherboard vendor, it is all from AGESA which they get from AMD. But they can just choose weather or not they want to expose all of the knobs from AMD to the user.
@@predabot__6778 Acording to more law is dead, RDNA3 MCDs could have V-cache stacked 2 chips high if I remember correctly, MCDs do seem to have vias for extra cache for at least 1 extra layer
@@predabot__6778 AMD is shifting from AGESA to open source firmware after ZEN5 so anyone will be able to see it. They have some experimental implementations already operating with Zen5. It will fit well with coreboot IIRC
It's no surprise they use more than one stack height, considering EPIC x3d chips had like a gigabyte of v-cache even in previous generation despite using the same CCD chiplets.
Another great content and amazing Die-shot. It's really interesting how these change in L3 cache will affect the next X3D implementation, Well I hope we don't have to wait too long. Btw I prefer this kind of content and I wish more people will get over gaming benchmark, gaming isn't everything.
3D cache being double stack would make sense, they managed to reduce the connector sizes that much because they moved the logic gates from the CCD to the first layer of the 3D cache.
@@HighYield Oh yes ;-). Please. RUclips Video compression on itself is just bad at 1080p, convert a 1080p video to 4K 60FPS would already be great to improve quality on youtube. Thanks again for your videos and all the best
For me, the main thing that surprised me (after this video), is how much changed and yet how much AMD managed to stick to the performance of a refined and "aged" design. Whilst we all would love a 20% uplift after a redesign, Im personally very impressed we didn't see a more obvious drawback/regression like on MTL
I really expected a new I/O die with Zen 5, but the fact that it is the same perfectly explains why Zen 5 has the same difficulties with two memory sticks per channel as Zen 4.
And why the uplift of Zen 5 over Zen 4 is so small. Better cores that are as memory constrained as the previous gen which was already memory constrained...
It seems they're putting all their IO efforts into Zen6 but honestly I think it's unlikely we'll see significantly higher frequencies with 2+ DPC DDR5 as there are inherent signal integrity challenges that go beyond what is limited by the PHY and UMC. I'd expect an improvement but wouldn't be surprised if supported speeds still aren't significantly higher in these configurations.
@@JJFX- Maybe Zen5+ next year with the Zen 5 CCD but new IO Die? With all the AI hype they might accelerate some things. You're right though, we're clearly running into the limits of DIMMs, as for the first time in decades we're seeing new forms pop up like CAMM2.
@@MacGuyver85 Well I doubt there will be an iterative generation on desktop prior to Zen6. I see the most significant IO improvements going hand in hand with the advancements brought by Zen6. Even though Zen5 overhauled the core design, in some ways this generation can be viewed as Zen4++. The next generation will likely bring it all together as the pinnacle of AM5 prior to moving forward with DDR6.
Maybe the 3DV-Cache is just half the size from before, and just a bit faster! It would be a solution to raise the clock speed of the whole ship too while retaining most of the benefits of the 3DV-Cache. Doing more with less!
If that lets them keep the "cache tower" away from the ALU areas that would be a huge boon. The baked in L3 cache makes very little heat compared to the ALUs and FPUs, so keeping the clock speeds high on the X3D chips would be possible. Perhaps even allowing full overclocking support. If TSMC thinks their yields are good enough for mass production I can't see why AMD would chose not to.
I could see the new 3D V-Cache extending over the entire die with the actual memory elements being a little spread out. This would allow for more thermal vias to assist in transferring heat from the CCD, through the V-Cache, and into the cooler. With the vias being so much smaller, thermal expansion resulting from the two dies being a different temperature will become a bigger issue. Improved heat transfer through the V-Cache will be required. Keeping the two dies the same temperature will be very important. It should also help in allowing the new parts to reach similar frequencies as the non-V-Cache versions. On a side note, using smaller vias could help in decreasing the power consumption when communicating over the cache bus. It could also help in maximizing the frequency. As far as the physical connection between the two dies goes, a reduced via count is a big issue. I assume additional structural vias would be present along the edge of the dies. Vias can take up a shocking amount of space compared to transistors. One really wants to avoid them as one typically has to compromise the logic in order to fit them in there. So reducing the number of logic / power vias to only what is required then moving the structural vias to the edge is probably the way to go. If looking for additional vias, that is where I would look.
The large data centers that spend tens of thousands per high core count cpu offset a lot of the cost for the consumers. And as for development, it takes years for each part of the cpu to be developed with many teams behind each part. Stuff for zen 5 was being developed before the announcement of zen 3. They usually mention it on their roadmaps and presentations. Tech like CPUs and GPUs are always a few years behind from the real cutting edge because of this.
Way back in time an Intel design engineer was talking about the original 8080A CPU. He said something to the effect of "The first CPU cost us a hundred million dollars. The second one cost fifty cents."
@@robertsneddon731Yeah which remains true but was also before the cost and time of developing bleeding edge nodes was this absurd and getting more and more expensive for marginal gains. Now we're getting to the point with nodes marketing improvements at fractions of a nm.
So have they changed the power domain to allow the V-cache to run at higher frequency without blowing the normal L3 with excess Volts? If so then Robert Halleck told the truth and the tech world in general ignored him. From Chips & Cheese analysis there's some interesting design choices in Zen5 front end to save power on prediction that will stall, almost as if they planned the full fat Zen5 with V-cache in mind, because losing some work fast will be less common. The wider front end should allow later catching up as opcodes are retranslated into micro ops. Going with SMT and larger L3 will reduce context switching with expensive flushes and OS scheduling, while avoiding the drawbacks of 2 virtual threads halving effective cache size.
Every time I watch one of your deep dives I am amazed that humanity is able to make such intricate pieces of technology. People hating online about how AMD sucks because Zen 5 doesn't increase gaming performance enough really have no clue about the insane engineering that goes into making chips.
It really is unfathomably ironic that while zen 3 and zen 4 carried a lot of bagage from older zen generations, their actual performance uplift where massive, in the order of 20%. meanwhile zen 5 is almost a completely new design, with massive transistor count uplift, and can barely surpass zen4 in most applications. Hysteric
IODie is the answer: it's the same. Zen 4 gets a massive uplift from the 3D V-Cache, in other words the Zen 4 cores are massively memory constrained in many workloads. What happens if you improve the cores but keep the same memory constraints? Not much. Hence why Zen 5 3D V-Cache should see a dramatic uplift.
My guess is AMD isn't taking full advantage of the new architecture yet because performance wasn't their primary goal. I know that sounds like cope but I think we'll see bigger performance gains later. Right now AMD got it to work without losing performance and still gained some efficiency iirc. Feel free to correct me, I haven't been paying close attention to all the numbers.
@@spamcan9208 In most workloads, the efficiency gain isn't good either. People are comparing 65W zen 5 parts to 105W zen 4 parts, hardware unboxed did good coverage on that. Linux gains are also mostly overstated from benchmarks i've seen. Also, why would windows selectively degrade zen 5 performance, when almost every other ryzen generation saw 15-20% gains in windows? Why would the IO die bottleneck the cores so hard, when zen 3 used zen 2 IO and still achieved ~19% perf uplift? IMO the problem is in the agesa/driver side of things, but its just speculation at this point
i just stumbled across your channel by pure luck.. didn't know we tech geeks can dive this deep into processors. you earned a new subscriber 🫡 hopefully alot more soon. EDIT: is it possible for us to download these images ?
SRAM cells hardly shrink with silicon node anymore (allegedly because of the required number of wires to and from each bit cell). So your idea of a double layer VCache seems plausible, despite the cost and yield disadvantages. I am looking forward to AMD presenting what they engineered here.
Integer core contains 8 alu,l1 cache, 192 integers registers 8 Transistor per registers ,and control unit and may be some more things they don't reveal obviously
Thanks for the video it was very interesting. I like to see the inner workings of what makes my pc work even though i have no idea what i'm looking at.
Stacked 3D V-Cache is no impossible, the first generation server boards had an option to select the number of layers in VCache (1/2/4 stacks), at least the setting was in the BIOS/UEFI.
That is insane. Zen 5 may have a couple tricks up its sleeve for vcache after all. I dont see how they can operate at higher frequency wothout the double stack which was stated unlikely though. That would be where most performance improvements would come from over Zen 4. But this base of Zen 5 seems to bode well for Zen 6 imo
Doing seperate IO and CPU chiplet design was smart move from AMD. So many possibilities opened at once: can use different node sizes, reuse existing designs, if a need arise just change one of chiplet design, not entire CPU, save some cash on production,.... At first I was skeptical about that, but it turned out great.
Amazing analysis of the pictures, great content I wonder what the perf would be if the 9800X3D came with 160MB (32 + 64 + 64) of L3 cache, would be super exciting if that was the case
At this point I'm more interested to see what Zen 6 APUs are capable of, it will be the integration between Zen 6 and RDNA5 that really will detail whether AMD has mastered optimal interconnect. Especially whether they are able to keep up in a world dominated by ARM and NVidia.
The resolution of those die shots is actually insane
Insane die shots from an insanely talented person: Fritzchens Fritz 😎
@@HighYieldplease tell me the nickname is Fritz²
@@HighYielddoes he make content of how he's able to produce these images? I'd watch
Yeah, the resolution blew me away. The level of detail is something I thought only engineers at the likes of AMD/TSCM/Intel had access to.
@@HighYield
i want to get an 8 core 9700X / and a 12 or 16Core 9900X/9950X but also both an 8core amd 16core X3D (especially if they have both v cache CCDS (or the 3D Cache/Inf fabric Link cache Bridge Die to not only have more L3 on one CCD the other has access as well through the substraight and ccd >< CCD (Bidirectional ) 😑🤤 think of the 3D V-cache (even if only on one CCD) has think of it as 4 channels rather than 2ch // if that makes sence ??? (cached data has silicon to silicon (aka 3D cache chiplet to CCD Communication but also communication over the Inf fabric in the substraight, but also the CCDs Inf fabric links on top of the CCD as well AND from CCD 2 CCD also ? hence my 3DCache Inf fabric bridge chiplet ? 🤤
Such a nice channel! Wish I knew it earlier :D Great video!
Ein Kommentar vom 8auer, jetzt kann ich glücklich sterben ;) Dankeschön!
Verdientes Lob !
dercucker!
Yeah, nun verstehe ich AMD, weshalb es den TDP Wert vo einigen Wochen so niedrig getrimmt hatte. In deinem Video hast du nur kurz über Temperatur geredet und vermutlich hatte dadurch AMD Angst bei höherer Taktung auf den CCD zu viel abwärme produziert wird und somit zum Totalausfall wird. Nach einpaar Wochen hatte sich die Angst via AMD als unbegründet herausgestellt. Ansonsten ein super Video und auch super erklärt. P.S. English is good - not very good. Hier fehlt ein wenig der Slang. 🙂
Du musst echt den Kanal bingen. Lohnt sich.
It's a pity that this kind of deep-dive content has such a niche audience. Your channel deserves to be far bigger than it is.
I already have a much larger audience than I ever imagined. I'm pretty happy. Plus it's cool to have such a tight-knit community :)
Qualtity is just as important as quantity!
I hope you get invited to other channels that use this video in their content! LTT is probably too basic, but techtech potato, level one techs and some of the more tech heavy ones😊
Such is the enthusiast side of an already esoteric topic. I barely found this channel and am now subscribed!
nah, everything is ruined when it goes from niche to mainstream.
Double stacked 3D cache?
Welcome back, consumer HBM
i havent looked at it like this but the more i think about the more i find it super funny xd
Welcome back, Zen 4 performance. Now called Zen 5.
@@Fin1nishingMoveConsumer tasks won't see Zen 5 gains. Zen5 gains more on Linux and more specialized tasks. Zen5 also gains efficiency compared to previous Zen4 release.
Double stacked 3D cache on both ccd's for 12c/24th and 16c/32th?
I've also pondered Zen 5c for either mixed Z5+Z5c or all Z5c for up to 24/48. Idk if 5c cores can clock as high. But, my laymen guess is 5c's lower cache per core wouldn't be an
issue with the double stacked 3D cache. I've also thought about chips like the 8700F getting 3D cache. Again, this is just me thinking aloud. But, I could see those defective APU's
being repurposed into budget 3D options that would perform between standard 7700X/9700X and 7800X3D/9800X3D. Or, would the halved 16mb L3 on the die itself be too little?
This is kind of a part 2 to my first comment... What I'm getting at is how important is having a regular pool of L3 cache on the die directly attached to the cores versus just having
the large 3D cache to compensate? I've heard people say that packing smaller (but equal ipc) Zen 5c cores in wouldn't hurt performance because of that large pool of 3D cache.
That way you could have hybrid 20c/40th, or a 24c/48th all Zen 5c chip. Like I stated before, this is just me as a laymen asking if this would work. Idk if the clocks would have to
be significantly lower on the 5c cores either. I do know that AMD has used hybrid Zen 4/4c in Phoenix APU's. And, when all cores ran at a fixed clock performance was the same.
Having 2-layer 3D-cache makes sense given the rumors that the new X3D-CPU's can boost higher. More compact 3D-cache that is further away from the compute cores would be less sensitive to heat coming from the compute cores.
As the SRAM cache is now probably stacked now on multiple layers. Wondering if it suffers also from exes heat. Question is how much can you stack before heat dissipation starts to become issue and making it more unstable etc... Even SRAM is more efficient and generates less heat.
Wasn't the main issue with the clockspeeds the lower voltage tolerance of v-cache, not the heat? The heat issue is solved with lower power limit and lowered thermal limit, but it should have still been able to clock high for as long as it had the thermal headroom instead of having locked core voltage
@@shepardpolskaSaying new X3D chips could "clock higher" really just means in relation to the non-X3D configuration. Since zen5 effectively has the same freq limits as zen4 while generally being more efficient, it's very likely X3D will get closer to this because it'll have more headroom. Especially since X3D uses binned chips capable of running lower voltages.
I don't understand. How would a higher stack the heat has to be dissipated through solve any heat issues?
@@unvergebeneid Heat comes more from the cores than from the cache. That's why they stack on the L2 cache in the first place.
The 35% reduction in L3$ is quite an amazing feat. Also, with the TSVs, this is another "generational" change, and I'm impressed with how far R&D goes to improve it unlike the IOD.
I think the lack of IOD improvement is kinda the point of the design lol
@@RyTrapp0 I don't disagree. When AMD has 3D V-cache for gaming, from their POV might as well save R&D until necessary.
@@NootNoot. AMD would probably prefer staying with the same I/O die for an entire memory generation, but I think we will see at least minor a revision, as the current one caps out at 6000MT/s.
OK, you have managed to engage my interest for Zen 5. This is the kind of analysis I have been missing since Jim from @AdoredTV stopped making tech content. That and his "Awright, guys, how's it goin'?".
yeah I miss him too
Same. I even began to understand the Scottish accent better thanks to him.
Jim is a great guy but he just wasn't built for the modern internet unfortunately.
I miss him too for sure
Miss his content, but totally get why he stopped. At least he helped really get this content niche off the ground for channels like this one.
Gosh those super tight-in shots near the TSVs just blew my mind. Reminds me of just how exquisitely complex modern processors are, and of course it's a tour de force in macro photography.
AMD like had extra SI vias for the previous generations to increase yields. Now that the process has matured, they don’t need them. The extra vias could have caused the power problems of the previous gens that restricted over clocking.
Interesting idea, it does make a lot of sense.
Yeah, removing redundancy coupled with some further improvements makes sense.
This needs many, many more views. The fact that the leaked figures show much more than 'Zen5%' over Zen 4, I think you are definitely onto something.
The fact AMD has found issues all over the place, and RUclipsrs like HBU have as well indicates that something on the consumer PC end isn't capable of pushing Zen 5 to their potential.
It took me a whole undergrad just to understand some terminologies present in video. "Just Understand". Semiconductor is quite interesting. Stuff like these make me realize there is so much more to explore. I hope more people find such stuff interesting.
Hey RUclips algorithm push this video. I hope "this" helps.
The TSVs pointing to a smaller V-cache die is really interesting if you take into account Semi Analysis' deepdive on Zen 4c.
In the article they do mention that part of the area savings are from moving to an (early production at the time) 6 transistor cache design rather than the traditional 8 transistor design commonly used in most CPUs currently.
Without control structors, with the new smaller TSVs, with the reduction in dead silicon, and using the Zen 4c tested 6 transistor cache cell, I could see them managing to just shrink it enough to fit on the new L3 + L2 area indicates by the new TSV structure.
Regardless, either this way, or the double stack proposed in the video will be interesting to analyze. Can't wait.
OMG THANKS SO MUCH for this video!!
I really do appreciate the invaluable work that goes into these videos. The only other people who go above and beyond for such a niche subject matter are Anandtech, who are now defunct, and Chips & Cheese. Thank you for your effort.
Didn't know about Chips & Cheese. Thx mate.
Excellent explanation. I want to see how X3D cache will be implemented on Zen 5 since some reports tell that the 9950X3D and 9900X3D will bring something new.
Yes the 2 CCD parts will both have stacked cache
That's amazing! Thanks for these inspections.
+1 for stats
Thx for your work. I love topics about internals 🙂
Your videos keep getting better and better
AMD kept saying it was a redesign. Your video highlights this. It seems like most people can't see past minimal IPC uplift and the introduction of full avx512. Excited to see what 3Dvcache brings for this gen.
Don’t forget the new branch predictor that can look 2 branches out instead of 1.
@@Fractal_32 Wait, that's insane 😮
@puilp0502 Check out the chips and cheese article on the topic, I tried supplying a link to the article but RUclips did not like that comment.
Yup. SO many comments about "AMD is getting lazy now" and what not, as if Zen 4 to Zen 5 was just Intel 13th gen to 14th gen. It can't just be that someone screwed up somewhere in this or that they just didn't get the results that they were expecting, it's always gotta be some stupid conspiracy about the chip manufacturers being Scrooge McDuck, doing everything they can not to have to make better chips. It's incredibly frustrating to try to have meaningful discussions in the tech community.
These modern high performance processors are easily the most advanced pieces of bleeding edge tech that we have in our daily lives - and even enthusiasts take it for granted.
Most people don't understand that a major architecture redesign doesn't always improve performance significantly at first, there can even be performance regression in some aspects. But updating incrementally without a major rebuild eventually gets diminishing returns. So throwing everything in the air and rebuilding with a new floorplan allows large structures to change, this also increases the headroom for improvements later. Making a large structure change on an existing floorplan means working around existing structures and wasting transistors to work around stuff, squandering the transistors a new process node brought. Eventually things get complicated and your "low hanging fruit" updates are gone, like beefing up some execution units to increase FP or INT performance.
Eventually you need a whole lot of stuff to change at once or you suffer bottleneck whack a mole. You need a more complex branch predictor to handle more throughout, beef up all the low level caches to handle that, completely redo the execution side to support 512 bit wide paths. Everything gets built bigger essentially.
Zen 5 is fresh and should have lots of headroom to improve with future process nodes. There's lots of easy low hanging fruit that should bring easy large gains by simply beefing up a few components.
@HighYield and @FritzchensFritz the impeccable duo!
Honestly, Fritz is the one with the talent here. I'm just his fan :D
@@HighYield Haha, c'mon, does a 'fan' get exclusive beautiful high-res die shots in private to make for a video? /s
Anyways, great work! Although idk if it would make for a good video, and the question of attainability, I'd love to see Turin Dense die analysis. Zen5c has changes to it's CCX arrangement as well sharing L3$ unlike it being split like Zen2. And oh boy, are those CCDs looong, which kinda of makes Zen5c on desktop unlikely.
@@NootNoot. There are still a few project in work, for example a Nvidia GV100. I'd love more deep dives, but Fritzchens does whatever he wants to (and rightfully so).
@@HighYield Excited for what comes next!
This die shot was exclusive to you? :0 I'd die (pun very intended) to get it@@HighYield
Wow, absolutely stunning photos and fascinating breakdown. Very informative.
Love your work! and Fritz of course
Amazing, like looking into the furthest depths of the universe. Beautifully presented and explained. Nice balance between highly technical and overview. More please!
Bro, the images are bonkers. They look so good, even after zooming in that much.
Thanks for the explanation of the chips, it makes the images even more amazing.
Thanks for the video. Always looking forward to new content from you!
Great presentation! Technical enough to be widely understood. (I hope.) You should be on AMD’s marketing team.
If the rumors about VCache overclocking are true, it makes sense that the VCache would be placed above the L3 cache, allowing for better cooling.
It’s unfortunate this architecture doesn’t perform better in gaming. If the boost was closer to 8-10% instead of just 3-5%, I think gamers would have a much better opinion, and more people satisfied.
It's a fascinating question though: If clock-speeds are higher than with Zen4X3D, does that mean that the performance-improvement from X3D will actually be HIGHER? Since the Core-die doesn't have to clock down as much, that could mean that some of the negatives of X3D could be mitigated, and X3D-CPU's will henceforth be better for production-work and work which favours frequency over cache.
@@predabot__6778 That's the hope.
Thank you. This is fantastic!
Incredible video, it's super interesting to see the details and thoughts and considerations that went in these chip designs, it's just fascinating to see...
AMD has promised "exciting changes" to the x3d chips, and the die shots seem to show that there really is more than just marketing BS happening
I was wondering if there'll be less 3D cache per CCD, but AMD will make up for it by populating both CCDs, rather than just one like in the current 3D v-cache.
The problem with marketing BS are for games. even with seeing review and the wider bunch of things and interview with Mike Clark(chief arch of zen), it was definitely designed for Core workload. It true AMD marketing teams are kind of stupid, but you still need something to show. and it makes me wonder if there a spy from nvidia pulling strings with stupid marketing (RDNA3 and delayed RDNA4) on GPU side
@@noobgamer4709
Seems to be a combination of Windows 11 being atrocious and AMD exaggerating their gains rather than just AMD being dumb.
The differences between Windows 10 and Windows 11 in performance are massive especially on Zen 5, check out Tech Yes City's recent vids on it.
Thanks for the beautiful analysis :3
Thank you for the work you do, it helps solidify thoughts of these products by knowing the physical characteristics.
Great analysis as always. I hope that Zen 5 X3D brings a larger leap in gaming performance than regular Zen 5 did.
The AMD has said that 3D cache version can clock closer to non cache version. So while 7000 and 9000 non cache versions have allmost same clockspeed… the 3D cache version 9000 version should clock higher than 7000 series does. And that is the main advantage this time.
@@haukionkannel if all of this is true then the 9800 X3D will be worth the buy.
Makes sense for binning purposes why there's no 800X non-3D CPU's anymore, especially for Zen 5 if the X3D can clock closer to non-3D. I suspect a 9700X3D will eventually surface that performs on par with the 7800X3D due to lower quality binning. AMD's naming convention gives them a lot of wiggle room for future Zen, with non-X, X, and X3D variants to fill in gaps if needed, or remove where redundant. I've been on the fence with Zen 5 X3D but these kind of deep dives give me hope that it'll finally be worth moving to AM5.
Fantastic video.
Great content, as always.
AMD has long had an option in the AGESA called "X3D", for which the options are "Auto, Disable, 1 stack, 2 stacks, 4 stacks" so I would not be surprised by a 2-hi stack
INTERESTING...! Could you give a reference for this claim? Would be very interesting if you could show this info' to High Yield, and have him determine if there's any correlation to potential multi-stacking.
@@predabot__6778 It is found under AMD CBS/Zen Common Options/X3D. Some motherboards expose this menu, but not all. ASROCK usually has these exposed, but this is not really a menu made by the motherboard vendor, it is all from AGESA which they get from AMD. But they can just choose weather or not they want to expose all of the knobs from AMD to the user.
@@predabot__6778 Acording to more law is dead, RDNA3 MCDs could have V-cache stacked 2 chips high if I remember correctly, MCDs do seem to have vias for extra cache for at least 1 extra layer
@@predabot__6778 AMD is shifting from AGESA to open source firmware after ZEN5 so anyone will be able to see it. They have some experimental implementations already operating with Zen5. It will fit well with coreboot IIRC
It's no surprise they use more than one stack height, considering EPIC x3d chips had like a gigabyte of v-cache even in previous generation despite using the same CCD chiplets.
Wow, simple and clear, love it!!
Another great content and amazing Die-shot. It's really interesting how these change in L3 cache will affect the next X3D implementation, Well I hope we don't have to wait too long.
Btw I prefer this kind of content and I wish more people will get over gaming benchmark, gaming isn't everything.
Very well done video. Thanks for sharing!!
I've been waiting for this one 🎊
3D cache being double stack would make sense, they managed to reduce the connector sizes that much because they moved the logic gates from the CCD to the first layer of the 3D cache.
This is insane.... Thank you sir!!!
This is just awesome. I literally could watch hours long videos of all this silicon pr0n further zoomed in.
I need to start doing 4K.
@@HighYield Oh yes ;-). Please. RUclips Video compression on itself is just bad at 1080p, convert a 1080p video to 4K 60FPS would already be great to improve quality on youtube. Thanks again for your videos and all the best
@@HighYield Could be wrong but I think even the 1440p youtube res has double the bitrate of the 1080p. And 4k is locked to youtube premium these days
What an audio upgrade!!!
Love the effort and concept in the video. Thank you
may your channel explode. Thank you sir
The editing and explanation. Man hope you're going 1 million.
Very cool dieshots. Pictures like these makes me think that we are underpaying so much for this engineering marvel.
godlike video man
Great videos! Back in the days, Anandtech would have a few dense articles on architecture design.
It's nice to have this with channel.
Thanks for this video!
You’re welcome!
awesome deep dive, thanks a lot!
awesome ! I've been longing for this video 👏👏👏
For me, the main thing that surprised me (after this video), is how much changed and yet how much AMD managed to stick to the performance of a refined and "aged" design.
Whilst we all would love a 20% uplift after a redesign, Im personally very impressed we didn't see a more obvious drawback/regression like on MTL
great vid, great die shots!
Phenomenal video 👍👍
Great Video, Thank you ! Side note, at 7:19 you wrote Core 4 twice on the right, instead of Core 5 :)
Thanks a lot for this deep dive 😉👍
I really expected a new I/O die with Zen 5, but the fact that it is the same perfectly explains why Zen 5 has the same difficulties with two memory sticks per channel as Zen 4.
And why the uplift of Zen 5 over Zen 4 is so small.
Better cores that are as memory constrained as the previous gen which was already memory constrained...
It seems they're putting all their IO efforts into Zen6 but honestly I think it's unlikely we'll see significantly higher frequencies with 2+ DPC DDR5 as there are inherent signal integrity challenges that go beyond what is limited by the PHY and UMC. I'd expect an improvement but wouldn't be surprised if supported speeds still aren't significantly higher in these configurations.
@@JJFX- Maybe Zen5+ next year with the Zen 5 CCD but new IO Die? With all the AI hype they might accelerate some things.
You're right though, we're clearly running into the limits of DIMMs, as for the first time in decades we're seeing new forms pop up like CAMM2.
@@MacGuyver85 Well I doubt there will be an iterative generation on desktop prior to Zen6. I see the most significant IO improvements going hand in hand with the advancements brought by Zen6. Even though Zen5 overhauled the core design, in some ways this generation can be viewed as Zen4++. The next generation will likely bring it all together as the pinnacle of AM5 prior to moving forward with DDR6.
I think zen6 will be on a new socket with ddr6 and pcie 6
Thanks for the great work!
Maybe the 3DV-Cache is just half the size from before, and just a bit faster!
It would be a solution to raise the clock speed of the whole ship too while retaining most of the benefits of the 3DV-Cache.
Doing more with less!
Great video as always
Thank you for bringing this info to us.
Thank you very much for this content!
I'm so gratefull.
amazing deep dive
Very well done!
I love this so much, thank you for your amazing work.
Dubbel stacking 3d v cache would be ballsy.
If that lets them keep the "cache tower" away from the ALU areas that would be a huge boon. The baked in L3 cache makes very little heat compared to the ALUs and FPUs, so keeping the clock speeds high on the X3D chips would be possible. Perhaps even allowing full overclocking support. If TSMC thinks their yields are good enough for mass production I can't see why AMD would chose not to.
Great job! I wonder how much, if any, extra performance will they be able to get out of this new architecture as software catches up.
Oh, finally. I was waiting for high-res pictures of Zen 5.
Super Arbeit. Danke für das tolle Video. Eine deutsche Version wäre natürlich der Hammer.
Well done! Maybe we will see a A-synchronous Chiplet Design again when AMD can use two extra layers of 3D-VCache on one Die....
Great review! Really interested to see 9800X3D announcement now! Planning a CPU/Mobo upgrade and will very likely buy it ASAP!
I could see the new 3D V-Cache extending over the entire die with the actual memory elements being a little spread out. This would allow for more thermal vias to assist in transferring heat from the CCD, through the V-Cache, and into the cooler. With the vias being so much smaller, thermal expansion resulting from the two dies being a different temperature will become a bigger issue. Improved heat transfer through the V-Cache will be required. Keeping the two dies the same temperature will be very important. It should also help in allowing the new parts to reach similar frequencies as the non-V-Cache versions.
On a side note, using smaller vias could help in decreasing the power consumption when communicating over the cache bus. It could also help in maximizing the frequency.
As far as the physical connection between the two dies goes, a reduced via count is a big issue. I assume additional structural vias would be present along the edge of the dies. Vias can take up a shocking amount of space compared to transistors. One really wants to avoid them as one typically has to compromise the logic in order to fit them in there. So reducing the number of logic / power vias to only what is required then moving the structural vias to the edge is probably the way to go. If looking for additional vias, that is where I would look.
Wow, RUclips schlägt diese Video vor... damit hätte ich nicht gerechnet.
Fix ein Abo verdient!
Thank you.
Incredible video.
To me, this is a lot harder than rocket science. It is amazing how relatively cheap they can sell these wonders.
The large data centers that spend tens of thousands per high core count cpu offset a lot of the cost for the consumers. And as for development, it takes years for each part of the cpu to be developed with many teams behind each part. Stuff for zen 5 was being developed before the announcement of zen 3. They usually mention it on their roadmaps and presentations. Tech like CPUs and GPUs are always a few years behind from the real cutting edge because of this.
Way back in time an Intel design engineer was talking about the original 8080A CPU. He said something to the effect of "The first CPU cost us a hundred million dollars. The second one cost fifty cents."
@@robertsneddon731Yeah which remains true but was also before the cost and time of developing bleeding edge nodes was this absurd and getting more and more expensive for marginal gains. Now we're getting to the point with nodes marketing improvements at fractions of a nm.
Love this content
So have they changed the power domain to allow the V-cache to run at higher frequency without blowing the normal L3 with excess Volts? If so then Robert Halleck told the truth and the tech world in general ignored him.
From Chips & Cheese analysis there's some interesting design choices in Zen5 front end to save power on prediction that will stall, almost as if they planned the full fat Zen5 with V-cache in mind, because losing some work fast will be less common. The wider front end should allow later catching up as opcodes are retranslated into micro ops.
Going with SMT and larger L3 will reduce context switching with expensive flushes and OS scheduling, while avoiding the drawbacks of 2 virtual threads halving effective cache size.
Every time I watch one of your deep dives I am amazed that humanity is able to make such intricate pieces of technology. People hating online about how AMD sucks because Zen 5 doesn't increase gaming performance enough really have no clue about the insane engineering that goes into making chips.
It really is unfathomably ironic that while zen 3 and zen 4 carried a lot of bagage from older zen generations, their actual performance uplift where massive, in the order of 20%.
meanwhile zen 5 is almost a completely new design, with massive transistor count uplift, and can barely surpass zen4 in most applications. Hysteric
*in windows
@@dhuckbourning7165*and only in games.
In serious workloads like for servers it looks very promising.
IODie is the answer: it's the same. Zen 4 gets a massive uplift from the 3D V-Cache, in other words the Zen 4 cores are massively memory constrained in many workloads.
What happens if you improve the cores but keep the same memory constraints? Not much.
Hence why Zen 5 3D V-Cache should see a dramatic uplift.
My guess is AMD isn't taking full advantage of the new architecture yet because performance wasn't their primary goal. I know that sounds like cope but I think we'll see bigger performance gains later. Right now AMD got it to work without losing performance and still gained some efficiency iirc. Feel free to correct me, I haven't been paying close attention to all the numbers.
@@spamcan9208 In most workloads, the efficiency gain isn't good either. People are comparing 65W zen 5 parts to 105W zen 4 parts, hardware unboxed did good coverage on that. Linux gains are also mostly overstated from benchmarks i've seen.
Also, why would windows selectively degrade zen 5 performance, when almost every other ryzen generation saw 15-20% gains in windows?
Why would the IO die bottleneck the cores so hard, when zen 3 used zen 2 IO and still achieved ~19% perf uplift?
IMO the problem is in the agesa/driver side of things, but its just speculation at this point
i just stumbled across your channel by pure luck..
didn't know we tech geeks can dive this deep into processors.
you earned a new subscriber 🫡
hopefully alot more soon.
EDIT:
is it possible for us to download these images ?
You can download them here: www.flickr.com/photos/130561288@N04/albums/72177720320942793/
This is amazing!
SRAM cells hardly shrink with silicon node anymore (allegedly because of the required number of wires to and from each bit cell). So your idea of a double layer VCache seems plausible, despite the cost and yield disadvantages. I am looking forward to AMD presenting what they engineered here.
Can't wait for a full indepth Lion Cove and Skymont analysis when ARL comes out!
The die shot is amazing, looks like a factory, no wonder it does eveything in the PC
how can you be so interesting?
Integer core contains 8 alu,l1 cache, 192 integers registers 8 Transistor per registers ,and control unit and may be some more things they don't reveal obviously
did some measurements, the 4KB page bank of the L3 has a total surface area of only 886.24µm²
Your voice is very ASMR. Also, this deep dive into ZEN 5 is very interesting and informative.
Thanks for the video it was very interesting. I like to see the inner workings of what makes my pc work even though i have no idea what i'm looking at.
Excellent work. This is brain candy for my nerd interests. Keep it up
I actually wouldn't be surprised, if AMD squeezes NPU somewhere into IO die for Zen6
Nice breakdown. Would LOVE to see something similar for the Threadripper or Epyc, or Intel's Xeon cpu's.
Stacked 3D V-Cache is no impossible, the first generation server boards had an option to select the number of layers in VCache (1/2/4 stacks), at least the setting was in the BIOS/UEFI.
That is insane. Zen 5 may have a couple tricks up its sleeve for vcache after all. I dont see how they can operate at higher frequency wothout the double stack which was stated unlikely though. That would be where most performance improvements would come from over Zen 4. But this base of Zen 5 seems to bode well for Zen 6 imo
Doing seperate IO and CPU chiplet design was smart move from AMD. So many possibilities opened at once: can use different node sizes, reuse existing designs, if a need arise just change one of chiplet design, not entire CPU, save some cash on production,.... At first I was skeptical about that, but it turned out great.
Amazing analysis of the pictures, great content
I wonder what the perf would be if the 9800X3D came with 160MB (32 + 64 + 64) of L3 cache, would be super exciting if that was the case
Dhanyavad
At this point I'm more interested to see what Zen 6 APUs are capable of, it will be the integration between Zen 6 and RDNA5 that really will detail whether AMD has mastered optimal interconnect. Especially whether they are able to keep up in a world dominated by ARM and NVidia.