I put about 5,000 into AMD in 2018 at 19 a share because of the comprehensive coverage Jim does. That $5,000 of AMD is basically the only reason why I haven't come out with losing money in the stock market lately.
Very Plausible. If you went in with AMD as they released Zen1 (2017-2018) and bought stock at around 10€, then you multiplied your money by 8 by now (ignoring the 120€ + spike in december 2021)
Maybe a silly question, but if i buy a share for lets say 10bucks, then later sell it for 20bucks, i litterally make 10bucks profit? Is it really that easy? Cant be right.
I think that one thing that is missing from the 3d v-cache analysis on the second die is that on top of the bonding cost there's also a bonding yield. Like AMD's Gabriel Loh said, with many more chiplets there's the increasing risk of throwing away the whole building just because the escalators don't work.
@@sawyerbass4661 If the bonding process fails the entire chiplet is lost, and if the defects are discovered later in the packaging process, then you've lost an entire CPU. Two CCDs, an IO die, the cache die, the substrate and the cost of packaging all lost to waste. There's more at stake in using two CCDs than an extra $20 for the chiplet and bonding process. The real question is whether the packaging yields are so abysmal that the CPUs lost with 1 vs 2 cache dies sufficiently alter the calculus to justify the heterogeneous design. If you take the $50-100 price premium between the 7700X and upcoming 7800X3D as representative of the additional cost in materials, packaging, and amortized yield losses, and assume AMD is making the same or better margins on their 3D parts (~45%), then it costs them about $25-50 to add a cache die to a CPU. A hypothetical 16-core 3D part with two cache dies probably would've been given an MSRP of $799 vs $699. For that $200 vs the 7950X, you'd get a part that was worse in every way except its gaming performance. AMD probably took one look at that assessment, said "fuck that shit," and never looked back. Rather than an example of bean-counting, it looks to me like AMD picked a middle ground, and ended up doing neither very well.
@@LtdJorge I think it does, because of the hybrid-bonding process - once you have fused them together, the change is irreversible. Entire assembly needs to be scrapped if it fails.
Actually mostly due to Intel having 95%+ of the server market in that period and the vulnerabilities making companies buy more Intel servers as they had already comited to intel's platform and weren't due to change soon, but now AMD has penetrated the server market.
Yup … Ironically enough, the very money Intel wasted for easily over a decade (e.g. the mountain of +$140B for share-buybacks, the +$20B for their joke of a modem, the +$12B for their trying to kill ARM in the mobile-space with their inferior Atoms, the +$6B for Dell to prevent them deflecting to and ship anything AMD, the +$10B Optane et al.) is the very money, they're running low on today - The very time-frame when their liquidity is NOW the key to their own survival. _How funny is that?!_ Sweetheart Carmen is so giving to Intel, it melts my heart! ♥ Or was her name Karma? Don't remember … Anyway, Carmen is just so precious, though she can be a real b!tch! I like it. ツ Truth is stranger than fiction - _The best jokes still get written by life itself._ Intel has become one since a decade straight …
@@rattlehead999 Essentially they traded some enterprise market for some consumer market, pulling a big lead in the mobile space. Not exactly an ideal combination for income though lol, the consumer market might as well be funded by the enterprise market.
@@RyTrapp0 Intel always have had the lead in the mobile market. The consumer market is a tiny portion of the income, the server market is where the overwhelming portion of the money is.
28:06 If I remember correctly AMD had the means to implement V-Cache since Zen 2 actually. But I don't think it was feasible enough and the TSVs got removed. Which makes you think how much AMD plans ahead since they could had released a V-Cache chip back in 2019.
When Zen 3 X3D came out, someone did a die analysis of their regular Zen 3 die and found that, yes, the TSVs are there. And Tom from Moore's Law is Dead has a source that told him, "Oh yeah, they're in Zen 2 as well, AMD just couldn't get the V-cache working in time." According to that source, Zen 3 V-cache was basically the prototype of what V-cache will be moving forward, and based on the changes to TSV arrangement Jim is showing in this video, that seems to be true. If AMD can get this stuff to the point where the voltage required to hit 5.8+ GHz can be run through a chiplet with V-cache without frying said cache, you'll have a true no-compromise CPU that is the undisputed king of both gaming and productivity.
The explanation given when they announced V-cache would be coming was that it was getting the bonding and packaging yield of them high enough which delayed the tech. Suppose 2 CCD & 2 L3D result in 50% wastage, you'd need a large premium to make up for the CPUs you cannot now sell to produce each 3700x3D or RomeX V-cache.
There's very few RUclips channels that make me drop everything I'm doing to watch the new video when it's published. This is one of them. The Intel video sounds super interesting too. Really seems like the waves from the Ryzen earthquake are turning into a tsunami for Intel.
Never apologize, Sir! It's just good to hear from you once again. Yeah, I got drawn in to the HEDT platforms from Intel. Hell, I am still on X79, running Xeons. Great work as always, Jim. All the best, and God Bless. o7
Until recently I was on x79 too (3930k@4.4ghz). I just jumped up to a 7950x and it's a significant upgrade, but I doubt it's going to last me the 12 years that the last one did.
@@fuzz11111111 remember that older Intel CPUs lasted for very long because Intel dragged their feet in terms of innovation, as they had no competition. Right now, competition is extremely fierce.
@@LtdJorge was a 6 core decade for me, I had a 3930k which is part of what kept that computer going so long for me. Having the 40pcie lanes helped too (as it gave me the ability to put in a modern NVME drive alongside the GPU with both running full speed, I had to mod the mobo bios to be able to boot off that NVME drive mind you, but it wasn't actually that hard). For those 12 years I also had an X58 based computer as a secondary system (mostly my wife's), which was almost as fast as my main one (once I put a 6 core xeon x5660 with a 4.3ghz overclock anyway). I have to say that HEDT not being a realistic option this time around was a bit sad as it's part of what made my previous 2 computers last so damn long (and it was annoying having to take the downgrade in pcie lanes, especially when almost every x670e motherboard manufacturer prioritises them like the only thing that can go in a pcie slot is a GPU, so remaining good lanes are allocated to m2 slots like we all need to run 4+ m2 drives at full speed).
I'm my opinion, the graph at 13:00 is way more of a testament to the relevance of the 5800X3D than anything else, especially for someone who already have a decent B450 board or better.
In the state of things right now, the 5950X is still the best cpu for the wattage it consumes. Now I imagine the Zen 5 16-core CCD to be a 5950X on PEDs. And when we see how good the 5800X3D is with just added L3-cache, I just wish AMD could also give us an SoC chip with everything on it - you couldn't customized it, but a Zen 5 SoC with 16-core/32Th 64GB of RAM and RDNA3 GPU, hooked on a tiny board with 2 x 10GE and 4 x USB Type-C and a sound card, where do I sign?
When I was in hospitals and recovering for months of chronic health complications I discovered your channel. I ploughed through every episode while you were still releasing regularly and it got me into industry analysis and the game of inside leaks as a community in tech. All this to say, I'm always grateful to see a new video and I seriously appreciate how much work you put into build this labour of love of a channel. I'm glad to hear you here and there even if not like before, I love these long form editorials so much. Take care Jim, and thank you for everything. -Kris
Sorry to hear that, hope you're feeling better. You might like to check out Moore's Law Is Dead especially the Broken Silicon podcasts here on YT. God bless :)
Alot of ways AMD can go. They can go chiplets using Zen 4 cores as the e cores and using the higher ipc Zen 5 cores as the P cores. They could add 3D V-cache from the start and then add multiple stacks for higher end problems. They also could add HBM to their products reducing the Latency even further. Alot of ways to go.
The only issue with adding more stacks of V-Cache on top of the chiplets is the heat dissipation issues, hence why the frequency is lower on X3D chips versus non X3D chips, more layers, lower clocks, and loads of games and applications prefer frequency over cache size. There is another option however that is being used for the MI300 APU, cache on the bottom with the compute dies on top, that resolves the heat dissipation issue but requires more complexity to allow voltage to travel through the cache dies to the compute dies, which is possible hence the MI300 APU but will be more expensive, at least to begin with.
@@Soutar3DG The strategy would be to go wider with parts of the CPUs running in sweet spot frequencies (like laptop or server), the asymetric approach means the fast logic can be non-stacked. I could imagine a basic 4 core block in an L4 cache chiplet so the scheduler can put whole CCXs to sleep when load is light, while the high performance CCDs are tuned either for many core/light cache or fewer core and full fat caches. Try setting the base clock on an X3D down a little and turning off boost, it turns in respectable performance in benchmarks at laptop like CPU power levels. According to Robert Hallock of AMD, it was voltage limitations which required a small cut to max boost frequency the 5800x3D of mine manages 4.55GHz peak on at least 4 of the cores at times, not much off the 4.7GHz without V-cache which is reached for very little of the time. The thermals are actually better in practice than my carefully tuned & undervolted 5600x with the original steppingj; because going out to DRAM costs power, no V-cache and 2 fewer cores with lower power density. The Zen4 & Raptor Lake launch reviews showed the diminishing returns of "speed demon" BIOS settings, you can reduce the TDP plenty with a very slight drop in performance. These long running frequency preferring programs are doing what exactly? Because a game losing a couple of fps isn't serious, if you're in fast frequency trading algorithmns then reducing latency is key, most high boosting relies on gaining thermal head room with lower utilisation. All the long running slow jobs I had for the computer, are best resolved by starting them and then doing something else rather than stare at an hour glass icon. It's all too evident that high frequency comes with very high power use, halving TDP is just losing a few % in reviews; so the problem isn't "prefer frequency" what you really mean is fequency * IPC. That's why pipelined micro ops are used so a sequence of instructions can run simultaneously, running slower but with more IPC has been delivering performance while clocks crept upwards more slowly. Caches have never increased performance until your code spills out of them, it's the same with games and VRAM, when you exceed the capacity suddenly performance tanks.
@@Soutar3DG The reason for lower frequency is that the V-cache chiplets can't take that high of a voltage and you need to drop the clocks to drop the voltage. There is a cooling factor too which you can see, since all the zen4X3D CPUs have like a 5-7*c lower max Temp rating, but the main issue with them is the voltage. If the cooling was the issue, they would let the cores use as much voltage as they want, since CPUs measure Core temp anyway, so all it would do is act as if you had a worse cooler. Peak boost clockspeeds would still be high.
I suspect the reason to not go double VCache was the extra latency from Die to Die killing plenty of the benefits. I presume the whitelist Die toggling via the XBox Gamebar implies there was currently no simple, elegant solution.
Not only that but due to V-cache being temperature sensitive, they wouldn't be able to push the power envelope of the die near what the non-VC die can handle.
Extra die to die latency still exists on the non vcache parts, so it's not like if you have a 9+core app on the 7950X it won't suffer the same fate. For games it's a non issue, few games will spin up that many threads and even those that do, are only getting small utilization/benefits, usually not worth it to cross the barrier. Limiting to only 8 cores with the vcache benefits usually dwarfs the loss of those few extra threads. I'd say the clock loss might be higher up on the priority list. Otherwise there is no *additional* downside adding vcache to both sides, vs a chip that had say no vcache at all.
Regarding the asymmetric layout of the 7950X3D: I have a 5950X for work and play, and I already launch games with a script that locks them to my best CCD. I run Linux exclusively, but you can do the same on Windows. As you showed in the CCD-to-CCD latency chart (and as documented by every review benchmark) the built in scheduling on Windows and Linux are still not perfectly tuned to AMDs approach, but to fix it manually is absolutely trivial for us nerds. So: Yes, I'm going to swallow the 7950X3D hook, line and sinker. The only difference I have to implement is testing which games like frequency over cache and lock them to the relevant CCD. I already have three 1440p 27" monitors and the visual upgrade of 4K is just not worth it, especially not in games that have a good ghosting free TAA implementation. I can borderline not tell the difference unless we're talking bigger monitors. For that reason I'm going to pair the 7950X3D with a 7900XTX which runs RT at 1440p at perfectly acceptable frame rates even without upscaling (at 1440p the 7900XTX has about the same RT performance as the 3090ti). And this arrives me at a system that does what I want: If I want a compile job to finish fast I give it both CCDs and go grab a snack. If I want to game while it's compiling the game gets the X3D CCD and the compile job gets the vanilla CCD. This kind of CCD allocation works fine already on the 5950X, but soon I'll be getting the gaming performance of the 7800X3D while "a 7700X with better clocks" handles the compile jobs. So from my perspective AMD did the right thing. Many things don't benefit from the extra cache, and would rather not have the frequency loss associated with the "warm blanket of V-Cache", so to me it actually IS the best of both worlds. The usability experience just happen to be a bit questionable, but that's life with AMD.
Yes, the flexibility really does seem nice/best of both worlds. I wonder if it even is a solvable problem for the scheduler to solve "automagically", without doing the same thing you have to, investigate what CCD you want your app on, and assign them via script. Even Intels more sophisticated "thread director" on windows 11, still can run into situations where the end result is not what the user wanted.
How did you script it to load games on specific cores in Linux and can that benefit with older intel CPUs? I haven't even thought about it but want to try it out.
@@tacticalcenter8658 read the manpage for taskset (part of util-linux) and everything will instantly make sense. There is no point in doing it on homogenous core layouts like pre- e-cores and pre- CCDs.
I guess its not the 5 Bucks for a second V-Cache die , its the lost of the 5,7 GHz Turbo Boost on the non V-Cache Die , thats 500 MHz higher than on the V-Cache Die and make a noticeable Difference in those apps which run primarly on one Core like Photoshop . The 7950X3D is not a Gamer CPU , its a CPU meant for those who want 16 Cores for Apps a n d the 3D V-Cache Boost in Gaming . Would an second 3D V-Cache Die deliever another 5-10 % of Gaming Performance , AMD would have done it .
I like Jim and his videos, but to me his take on 7950X3D seems pretty misguided. I'm confident the 7950X3D we got is a far better product than a dual V-cache variant for most situations if they can just nail the scheduling or give more control of that to the users (e.g. Factorio not using the V-cache CCD, or CSGO not using the frequency CCD)
@@maxbirdsey7808 Right. Hardware Unboxed did a great analysis of whether certain results are a scheduling issue or not by disabling the non-V-cache die entirely for part of their testing. Night-and-day difference in Factorio. And just about every review I've seen shows that like with the 5800X3D, Zen 4 X3D chiplets suck at productivity. The hybrid approach really was the best of both worlds for the 7950X3D, as you lose less productivity performance (something Jim claims to care about, and should as a creator) than you would with two V-cache dies, while having noticeable gains in gaming.
@@benjaminoechsli1941Most reviews only check a limited variety of productivity tasks though. There's applications that do benefit from the extra cache.
@@YlkevanSpankeren not in the real world where you use at least 1080p , but in CS Go there will be a Difference ,lets say 1200 FPS instead of 1100 @720 p Besides : with my 5800X3D there was a noticeable Boost vs the 3900X in Age of Odessy it gone up to 135 - 140 FPS instead of 105 -110 in 1440p in the Benchmark , the 3900x could not saturate my 6800 XT
I had i7-3820 with quad channel memory... This was Intel's E platform, which they ditched. Threadripper is also quad channel and can be called consumer CPU
I do wonder if AMD would make an L3/L4 cache by stacking the 3D cache chip across the core chiplets (so one cache chip would be connected to both core chips)
They're sort of doing that with the MI300 APU, the cache is the bottom layer with the compute dies on top, it resolves the heat dissipation issue so clocks can still be higher but requires more complexity to get the voltage for the compute dies to go through the cache dies, so I can imagine that solution will make it's way to desktop chips once the cost comes down.
@@Soutar3DG Yeah, I was excited when that thing came out. They've been working on this fusion concept since, what, 2010? 2008? And it's finally coming into reality.
Really hard to know who is at fault - if the foundries or the design teams. It always seemed to be the former, but lately more and more sources claim the foundries are doing better than expected and it's the design teams at Intel messing up. I don't think AMD themselves have decided many of the things you mentioned. Either way, I am looking forward for the results. I would love a desktop chip with Zen 4c cores only, or at least in a year or 2, Zen 5c chips only. I hope the efficiency gains will be enough for productivity that these non-gaming cores will be a thing even for consumers, not only server.
Yeah, if the design teams were used to the foundries being able to "bend" their process to help fix some of their issues, I could see that being a problem now that nodes are shrunk so much that the very laws of physics means the foundries arent' able to be as flexible. If you are used to working with a third party, eg TSMC, then you know you can only count on them so much and have to do the rest yourself.
Zen 1 ended Intel's gaming dominance because it became the talking point of every gamer in the world. Benchmarks and all that doesn't matter when you're objectively making a better product that uses less power, costs less money, and has a viable path forward in a platform. For idiots who still think Intel is better, well it's a shame to have 0 IQ init?
There are always people who only look at shallow statistics in every market. And either don't care or simply do not think about longer term things or implications or are just not very good at cost bennefit conciderations. The smart money started flowing towards amd in this case for sure. In the grant scheme of things most of these benchmarks simply do not paint a realistic or even useful picture. How many people game on dedicated constantly fresh windows instals and How many people buy only highest performance parts for an entire build? Add any applications running extra and it influences performance and generally will shift it to higher core count cpu's. Skimp a few 100 bucks on the gpu and don't buy the absolute fastest available and a 5 or 10% performance lead for one cpu or another can basicly evaporate completely. I have been hoping someone comes along and starts a channel dedicated to benching more realistic systems. I see nobody testing which cpu would be the best combination with a 3060 gpu for example. How much cpu power do you need to match a certain gpu. Or what new gen gpu upgrades make sense for 2 or 3 year old higher end cpu. That kind of stuff. I think this would be much more useful information for consumers.
It's very possible the stacking/bonding process currently bottlenecks production while process throughput is being scaled up. In other words, increased cost due to limited capacity on this specialized process, rather than material/silicon costs. This would explain some of the odd choices in both configuration and marketing speak. Just a guess though.
Being careful while learning the process is probably pragmatic too, once they have it well understood and fine tuned they can go ham with insane stacking arrangements and monstrous CPU / GPU units. I'm happy with my Ryzen 2600x and Radeon RX 5700XT right now, but when I'm ready to upgrade again there are going to be some crazy options on the table...
I must say that IBM PC standard needs a massive overhaul in itself. It's been 40-50 years and it served us well but now we are reaching the limits of the platform.
The ability to park to unused chiplet is such a huge advantage, not just in terms of power draw, but performance as well. The chiplet without V-cache can clock higher than the other chiplet, making it better at many applications. So you get a choice between V-cache or higher clock speeds. Besides, most games don't need more than 8 cores. So there isn't much of a penalty to being limited to one chiplet.
If it's done automagically then sure, but it does offer a quirky value proposition if you're buying a CPU where you get half the cores for gaming but all for productivity... (Although Intel's small cores are basically that same value preposition)
Performance benefit of dual v-cache dies? Depends on your workload. One thing I noticed in your screenshots (for example at 9:44) is that the non-vCache cores are running a full 1000 MHz faster than the v-cache dies (with the exception of single-core 0. So if you need high clock speeds for some workloads, the non v-cache actually does give you a performance boost. For pure gaming workloads, the 7950X3D is a waste of money in the first place -- wait a couple more months for the 7800X3D. We have to consider the benefit of the higher clocks on the non-gaming workloads when we talk about overall performance of the 7950X3D.
I feel they are just holding dual cache stacking for if intel comes back with something so they have an easy win by adding stacks on both for next gen chips.
I've watched for the last 25 years as "AMD set to end Intel's dominance" headlines have come and gone. That includes people telling me "this time is different" so don't bother.
22:40 but this will also increase core to core on the same chiplet latency, because there is a larger structure to navigate, just like cache latency increases with size? This will cause a drop in light to medium threaded total IPC? V-cache would of cause hide this, along with other improvements as well. I guess we can only speculate.
Isn't wired stacking effectively a semi-emperimental tech still, with low orders volume? If so, economy of scale principle: low orders -> expensive manufacturing. Then die bonding might be pretty pricey right now.
Yeah, Zen 3 X3D was basically a working prototype, with Zen 4 X3D being v.1.0. Looking forward to seeing if Zen 5 V-cache solves the voltage issue so we can run V-cache chiplets at closer to 6 GHz.
In theory that is true. AMD is past the inflection point where costs come down dramatically though. They’re doing tens of millions of chips annually (including Epyc) with vCache already, and the packaging process is all automated because of the technical requirements. The cost isn’t really going to come down significantly from what it already is. Now the price they’re paying TSMC and the margins they’re capturing is a whole different discussion. I strongly suspect AMD is paying a pretty penny and TSMC is just using that pricing power to increase their own margins. That equation won’t change with volume, and it will only change with time it TSMC has some actual competition.
Their first glued-together chip was the Pentium D, which was two Pentium 4 dies placed on the same package. No communication between dies except through the front-side bus, which means it was actually SMP on a single socket. The Core 2 Duo was a proper dual-core chip. But their Core 2 Quad chips were two Core 2 Duo dies on a single package, also effectively SMP on a single socket. Calling those "glued together" is appropriate, given the lack of communication on the package. AMD's multi-die chips, however, had very high-speed communication on the package right from the start. So Intel's attempted repeating of the "glued together" taunt they had received 10+ years earlier fell flat.
2 V- cache CCDs would be slower than one in gaming because of between chip latency. You can see this with the 7700X beating the 7950X in many games, even with a lower max frequency and power limits.
I'd like to address the 7950X3D claims that games would benefit having 2 v-cache ccds - from what i know vcache ccd has much lower clock speeds and this reduces perf. on some games even with the extra cache. So the reasoning for 1 normal ccd and 1 vcache is that the game will use the one that brings it bigger perf. Essentially, 2vcache ccds would give lower perf. in total than 1/1, its not because of money
When calculating the cost of adding the extra cache to both dies of the 7950x3d, you forgot to consider the opportunity cost. Securing wafer orders at TSMC isn't something to be done lightly. They may have been constrained by how many wafers they had ordered. They can sell a lot more 7950x3d chips if they cut the vcache in half if those 7nm wafers are the limiting factor.
7nm production is down if I recall the recent news, meaning that theoretically there should be lots of excess capacity available. So i'm not sure if 7nm volume was a limiting factor here.
I don't understand why you say that 3D cache on both 7950X3D would be better? First of all, 3D cache only on one die is cheaper. Second of all, now you have better single-core performance since the non-3D cache die clocks higher, which is more important for some applications. Sure, there are some scenarios where it chooses the wrong die like in the Factorio benchmark, but that should hopefully get fixed.
Well put. Jim is a creator, and so productivity performance matters to him. He said so himself in this video. Why does he want V-cache on both chiplets when that would kill productivity?
I chose intel as my computer never turns off and that low idle power draw is a must. But I am worried about future intel chips using tiles would increase their idle power draw. AMD have high idle power draw due to their chiplet design.
The 7900x3d and 7950x3d are actually parking the non 3d cache die so when gaming it actually is an 8 core with extra cache. Issue is that it relies on the OSs thread management so if it doesn't detect a game properly the non 3d cahce chiplet isn't parked and thread might end up on it and the latency of having threds spred out over to chiplets actually taxes performance a bit. The 7800x3d will realistically math or even beat top 3d chips because it has 1 chiplet with 3d memory so it will avoid any wacky performance hit and instability due to the basic configuration that doesn't really on the thread scheduler
One thing to keep in mind with the evolution of 10nm intel parts is that the desktop parts have been progressively optimised more and more for clock speeds at every level. From the different layout of the cores to the in silicon timings they are built entirely around speed. If you take Sapphire Rapids and just clock up a single core, you don't get nearly as high as even Alder Lake despite it being the "same" Golden Cove core... Those server dies are just not pushed as far towards that end of the scale. Furthermore the whole 10nm process has been re-specced to lower densities to allow higher clocks and more manufacturability...
Jim you should do a video on Apple's silicon. I've always wondered if they can ever scale it up or if it's doomed to be constrained to mobile hardware.
Apple stuff is just constrained by the walled garden. The desktop socs are highly promising, but no one really bothers to make games on apple desktop silicon
I think (though I am not sure) that you can't have 2 seperate caches at the same level and expect extra performance. Programs simply request memory and are not aware of where the data is. If the data is on the wrong cache chip (maybe half the time?) they not only have to check that it's local causing a miss, then have to go over to the other chip which is far away. It's like the cache is at a different level if it's not on the bonded cache. If there is only one chip, there is no extra penalty for having to check if the data is local or on the other die. We know running games on the die that doesn't have the chip nearby suffers a performance hit and you should only run games on the first 8 cores as giving it the extra cores with the slower cache access does nothing. To acheieve good performance the connectivity to the cache needs to be improved. Much like sticking a second GPU into your computer you don't double the available video ram.
Great video Jim! One thing I always go back to with Intel is that they’re still competing with AMD using inferior transistor density. Intel is essentially competing against Zen 4 using the equivalent of Zen 2 manufacturing technology. Given the handicap, they’re actually doing quite well. The fact that they’re even in the conversation in terms of absolute performance speaks to the quality of their processor designs, at least on desktop. (Mobile looses due to inferior efficiency, and server looses due to inferior transistor density prohibiting higher core count parts). If Intel ever gets their act together and catches up to TSMC in manufacturing, it could get VERY interesting. But it seems like either their design team or their manufacturing team drops the ball every generation without fail.
Honestly I feel like I have more of an idea of when Zen6 will be like then Zen5. I've no real idea what the cache arrangement will be like for Zen5, or if they will actually be switching to 16 core CCDs (remember RedGamingTech also claimed that Zen5 CPUs would have Zen4c cores on them, which at the very least we now know is wrong). By Zen6 it seems almost certain to me that they will use 16 core CCDs, and likely they'll move all the L3 cache off die, since 2N will be an obscenely expensive process, it won't benefit SRAM scaling at all, and dedicated SRAM chips can apparently get around doubly density.
Jim never says how much Zen 5 has in terms of core counts, just guessing 12 or 16 core CCXs (I hope it's at least 12 because a 12-16 Core CCX w/ V-Cache would be amazing).
I suppose it all comes down to granite rapids. That'll be the bellwether for if they can turn the ship around. I'm almost rooting for them. power efficiency matters to the more important sectors (mobile and server) more than desktop, so that's a worthy trade off. Funnily enough it might be their design team, if anything, that lets them down there.
" power efficiency matters to the more important sectors (mobile and server) more than desktop" - speak for yourself there bud. I live in a hot and humid climate and the latest Intel chips can turn my house into an over damn near at idle. I just got a 5800X3D to replace my 3600 non-X and under volted it to a flat 1 volt. The thing never sees above 45*C when playing any game and idles between 26*C and 32*C (based on room temp). The hottest I've been able to get it was 61*C on a 3 hour Aida64 test, once you under volt the X3D chips they run cool. Linux just dropped a video with a 5000watt chiller on a 1300k and still got the chip up into the high 70*C. The upper Intel chips can't even be cold by high end air coolers when put under a steady load.
@@TdrSld That's because the CURRENT intel chips are ludicrously inefficient (10nm vs 5nm lol) whereas the X3D line of chips is ludicrously and awesomely efficient, even compared to the rest of zen (Which is ALREADY more efficient). What I am saying is, that if intel were to take a page out of AMD's book when it comes to efficiency... Including using a more efficient node (although in this case they would sacrifice clock speed, unlike AMD) it would be a good thing. I actually agree with you. Intel does not CURRENTLY have good efficiency, but will get it with the intel 4 node, is what I am saying.
Now, if AMD could just make a 16-core/32T SoC with 64GB RAM and a RNDN2/3 GPU that could be put on a tiny motherboard, it would sell like hot cakes as the performance would be unearthly.
@@onomatopoeia162003 Thats not so much an AMD thing, if Microsoft or Sony wanted them to make a more expensive, 16 core APU amd would gladly sell those to them. The extra price just isn't worth it for the consoles, since they want to sell those as cheap as they can, most of their profits is actually selling games.
I don't believe they can get both IPC increase and doubling cores per ccd at the same time, they'll have to do one or the other, all that i know is Zen 5 is a going for a wider design, and even if they manage to get 16 cores on a CCD, it must be unified and not 2 8 cores CCXs (12 cores CCX might not be out of the window tho), so i still think they'll keep the 8 cores design for now, what i am looking for tho is cross-communication between the 2 CCDs via a bridge instead of passing data throught the infinity fabric which would greatly reduce the latencies penatly when jumping to another CCD, ideally it would also mean that both CCDs would have access to each others L3 and make a sort of "virtual L4" a la IBM, add v-cache on top of that and you're looking at something pretty neat. That what i think Zen 5 will provide and to me is the only biggest weakness they have left with the chiplet design, they could also greatly improve the infinity fabric capabilities but i doubt it will help reducing latencies by a significant margin. I might be pretty conservative but i think that what's most plausible about what we can expect from Zen 5. I would be very surprised if they managed to cram more cores per CCDs.
I want intel to get it together. I think they'll be ok when the fabs are done. We need more competition more companies in the space. Not less. If they keep bleeding money you better believe AMD will bleed us dry if they the only ones in the space.
Well, I disagree, I think they're both doing it now. Looking back, how much money did these CPU tech companies get through the taxpayers stimulus package? I myself don't play these games if I give you hard cash for R&D I expect a lower price in exchange for purchase.
Hey Jim ... regarding 3D-Vcache and the 7950X you overlooked a couple of things and regarding Zen 5 you actually missed something entirely: - No game actually NEEDs more than 8 cores, so 3DVcache on a single die is all that gives you actual benefits in games - almost no production application responds to the massive L3 cache amount but clearly reacts to the loss of clock speed on the 3DVcache die. -> putting a cache die on both CCDs would have brought down the production app performance while gaining nothing in gaming -> it would have been a worse product - already with Zen4 there are 2 designs: Zen4 and Zen4c ... the latter optimized to be compact in die-size but achieving lower clock speeds and maybe having less cache -> Zen4c could be seen as an "efficiency core" variation of Zen4. - If full performance Zen5 really will achieve 22-30% higher ST performance compared to Zen4 then this WILL cost die space. There is NO WAY 16 of these cores will be packed on a single CCD ... for segmentation reasons alone this would be way too much. Even 12 cores would be too big. The area advantage of the smaller node will be eaten up by the area needed to achieve this MUCH higher IPC/ST-perf. Just look at the area used by Intels P-Cores ... yes, they pack a punch in ST but at the cost of area and power! - I can see a Zen5-gen product with an 8-core performance Zen5 CCD and a 16 core efficiency Zen5c CCD ... the latter cores might still reach the performance of a Zen4 core ... with 16 of them the production performance would be absolutely bonkers ... on top of the 8 full-fat Zen5+3DVcache cores My bet: Zen 5 high-end will be a 8C+16c core configuration with 3DVcache only on the full-fat-cores
I think the reason AMD only put the 3D Vcache on 1 of the 2 CCDs for the 7900X3D and 7950X3D is because the non vcache cores can clock higher than the vcache cores so applications that benefit more from higher clock speed and don't need as much cache are faster on the non vcache side. The problem is task scheduling sometimes gets it wrong and sends some or all of the threads to the wrong cores. So AMD would probably have been better off having a 7800X3D for gaming, and a 7950X3D with Vcache on BOTH CCDs for people who want to use it for gaming and productivity apps that need more than 8 cores. The 7950X (non 3D) still exists for people who run productivity apps that care more about clock speed than cache. I think the performance hit for task scheduling misses makes it not worth splitting the cache. That's the reason the simulated 7800X3D is usually faster than the 7950X3D. There are no misses because the other cores are parked. Maybe the task scheduling issues will get worked out at some point, but the workaround of basically turning off half of your cores fairly often seems like a pretty big waste of money. If I'm going to turn off 8 cores I might as well get the 7800X3D and save a few hundred dollars. If I need more than 8 cores then I either want higher clocks on all of the cores or I want higher cache on all of them.
Would be interesting to analyze if there would be benefit to have a L4 Cache on the I/O die or as a separat chiplet on the substrate. Either as SRam, DRam or HBM would probably have their own benefits and drawbacks but regardless that would significantly reduce the usage of RAM and therefor redice its latency penalty specially when you can have 1GB+ of L4 Cache.
I'm curious as to how much DRAM you could physically stack on the IO die. If you could put 32 GB there (stacked several layers deep?) you could eliminate the need for DRAM on the motherboard.
@@edgarrice exactly, that would kinda be the death of motherboard as we know them today, as Jim said in this video :D They (motherboard maker) could probably turn into just a "back-plane" maker.
7980XE's price is a good example and then boom 10980XE's price whoopsie what happened Intel? I've still upgraded from my old 2500K to 13900K last month as I needed the performance, RAM speeds and stability which after having so many problems with AMD and RAM. I've stayed on Sandy Bridge for 12 years so upgrading to a dead platform isn't really an issue for me.
"power-perf" - perhaps performance at fixed power? - it could be that higher frequency scales much worse with tiles due to higher impedance between tiles, so low frequency looks good, but high frequency all the power goes into losses on communication between tiles.
In my opinion the chiplet to chiplet latency can be solved by properly scheduling of threads by the operating system. I believe the entire software universe should be developed ground up with awareness of 10+ cores and E cores in the stack pipeline.
AdoredTV changed my life, when I saw his Zen moment from AMD i put all my savings in AMD and could finally buy a house. Cheers lad.
I told several people. Because i had no money at the time. But they didnt listen to me they kicked themselves now
Wow
I put about 5,000 into AMD in 2018 at 19 a share because of the comprehensive coverage Jim does. That $5,000 of AMD is basically the only reason why I haven't come out with losing money in the stock market lately.
Very Plausible.
If you went in with AMD as they released Zen1 (2017-2018) and bought stock at around 10€, then you multiplied your money by 8 by now (ignoring the 120€ + spike in december 2021)
Maybe a silly question, but if i buy a share for lets say 10bucks, then later sell it for 20bucks, i litterally make 10bucks profit? Is it really that easy? Cant be right.
I think that one thing that is missing from the 3d v-cache analysis on the second die is that on top of the bonding cost there's also a bonding yield. Like AMD's Gabriel Loh said, with many more chiplets there's the increasing risk of throwing away the whole building just because the escalators don't work.
I noticed this was missing too. Though, I hope because the chiplet is so cheap, even if it's a 50% rate, it's still an economical choice overall
@@sawyerbass4661 If the bonding process fails the entire chiplet is lost, and if the defects are discovered later in the packaging process, then you've lost an entire CPU. Two CCDs, an IO die, the cache die, the substrate and the cost of packaging all lost to waste. There's more at stake in using two CCDs than an extra $20 for the chiplet and bonding process. The real question is whether the packaging yields are so abysmal that the CPUs lost with 1 vs 2 cache dies sufficiently alter the calculus to justify the heterogeneous design.
If you take the $50-100 price premium between the 7700X and upcoming 7800X3D as representative of the additional cost in materials, packaging, and amortized yield losses, and assume AMD is making the same or better margins on their 3D parts (~45%), then it costs them about $25-50 to add a cache die to a CPU. A hypothetical 16-core 3D part with two cache dies probably would've been given an MSRP of $799 vs $699. For that $200 vs the 7950X, you'd get a part that was worse in every way except its gaming performance. AMD probably took one look at that assessment, said "fuck that shit," and never looked back. Rather than an example of bean-counting, it looks to me like AMD picked a middle ground, and ended up doing neither very well.
@@sawyerbass4661 But doesn't it mean, discard the entire chiplet+cache assembly?
@@LtdJorge I think it does, because of the hybrid-bonding process - once you have fused them together, the change is irreversible. Entire assembly needs to be scrapped if it fails.
Don't they have to sand down the chiplet first before binding the 3dV anyway? So just send it back to be polished and try again?
The financial problems Intel are now having were delayed by a full two years thanks to the demand for chips sending prices through the roof.
Actually mostly due to Intel having 95%+ of the server market in that period and the vulnerabilities making companies buy more Intel servers as they had already comited to intel's platform and weren't due to change soon, but now AMD has penetrated the server market.
@@rattlehead999 Yeah, when I saw AMDs server market share in 2020, I was like WHATTTT??? Like only single digits.
Now , they're making a big comeback.
Yup … Ironically enough, the very money Intel wasted for easily over a decade (e.g. the mountain of +$140B for share-buybacks, the +$20B for their joke of a modem, the +$12B for their trying to kill ARM in the mobile-space with their inferior Atoms, the +$6B for Dell to prevent them deflecting to and ship anything AMD, the +$10B Optane et al.) is the very money, they're running low on today - The very time-frame when their liquidity is NOW the key to their own survival. _How funny is that?!_
Sweetheart Carmen is so giving to Intel, it melts my heart! ♥
Or was her name Karma? Don't remember … Anyway, Carmen is just so precious, though she can be a real b!tch! I like it. ツ
Truth is stranger than fiction - _The best jokes still get written by life itself._
Intel has become one since a decade straight …
@@rattlehead999 Essentially they traded some enterprise market for some consumer market, pulling a big lead in the mobile space. Not exactly an ideal combination for income though lol, the consumer market might as well be funded by the enterprise market.
@@RyTrapp0 Intel always have had the lead in the mobile market.
The consumer market is a tiny portion of the income, the server market is where the overwhelming portion of the money is.
28:06 If I remember correctly AMD had the means to implement V-Cache since Zen 2 actually. But I don't think it was feasible enough and the TSVs got removed.
Which makes you think how much AMD plans ahead since they could had released a V-Cache chip back in 2019.
When Zen 3 X3D came out, someone did a die analysis of their regular Zen 3 die and found that, yes, the TSVs are there. And Tom from Moore's Law is Dead has a source that told him, "Oh yeah, they're in Zen 2 as well, AMD just couldn't get the V-cache working in time."
According to that source, Zen 3 V-cache was basically the prototype of what V-cache will be moving forward, and based on the changes to TSV arrangement Jim is showing in this video, that seems to be true. If AMD can get this stuff to the point where the voltage required to hit 5.8+ GHz can be run through a chiplet with V-cache without frying said cache, you'll have a true no-compromise CPU that is the undisputed king of both gaming and productivity.
@@benjaminoechsli1941 that's a world some people don't want to see. I say the market deserves it.
Zen 3 is basically "just" a Zen 2 with better Cache distribution, so yeah!
The explanation given when they announced V-cache would be coming was that it was getting the bonding and packaging yield of them high enough which delayed the tech. Suppose 2 CCD & 2 L3D result in 50% wastage, you'd need a large premium to make up for the CPUs you cannot now sell to produce each 3700x3D or RomeX V-cache.
Take as long as you need Jim. I will always appreciate any content you release because it is free. Always put yourself first.
Sorry, but what about "Italy first" ? He come after ?
Couldn’t agree more. Quality always trumps Quantity
Alright guys
How's it going
Sorry for the wait
Not alright, it’s “owrite”
Hawzit goin'
I’ll catch you later, guys.
There's very few RUclips channels that make me drop everything I'm doing to watch the new video when it's published. This is one of them.
The Intel video sounds super interesting too. Really seems like the waves from the Ryzen earthquake are turning into a tsunami for Intel.
The only remaining one for me with the bell.
Never apologize, Sir! It's just good to hear from you once again.
Yeah, I got drawn in to the HEDT platforms from Intel. Hell, I am still on X79, running Xeons.
Great work as always, Jim. All the best, and God Bless. o7
Until recently I was on x79 too (3930k@4.4ghz). I just jumped up to a 7950x and it's a significant upgrade, but I doubt it's going to last me the 12 years that the last one did.
@@fuzz11111111 remember that older Intel CPUs lasted for very long because Intel dragged their feet in terms of innovation, as they had no competition. Right now, competition is extremely fierce.
@@LtdJorge oh yeah, I remember it took about 5 years before I even felt there was a significant upgrade possible.
@@fuzz11111111 good old 4 core decade 😂
@@LtdJorge was a 6 core decade for me, I had a 3930k which is part of what kept that computer going so long for me. Having the 40pcie lanes helped too (as it gave me the ability to put in a modern NVME drive alongside the GPU with both running full speed, I had to mod the mobo bios to be able to boot off that NVME drive mind you, but it wasn't actually that hard).
For those 12 years I also had an X58 based computer as a secondary system (mostly my wife's), which was almost as fast as my main one (once I put a 6 core xeon x5660 with a 4.3ghz overclock anyway).
I have to say that HEDT not being a realistic option this time around was a bit sad as it's part of what made my previous 2 computers last so damn long (and it was annoying having to take the downgrade in pcie lanes, especially when almost every x670e motherboard manufacturer prioritises them like the only thing that can go in a pcie slot is a GPU, so remaining good lanes are allocated to m2 slots like we all need to run 4+ m2 drives at full speed).
I'm my opinion, the graph at 13:00 is way more of a testament to the relevance of the 5800X3D than anything else, especially for someone who already have a decent B450 board or better.
Even with shity 30$ a320 5800x3d is god send.
@@BramdyBrandy thanks to the fact that Zen 3 sips power.
In the state of things right now, the 5950X is still the best cpu for the wattage it consumes. Now I imagine the Zen 5 16-core CCD to be a 5950X on PEDs. And when we see how good the 5800X3D is with just added L3-cache, I just wish AMD could also give us an SoC chip with everything on it - you couldn't customized it, but a Zen 5 SoC with 16-core/32Th 64GB of RAM and RDNA3 GPU, hooked on a tiny board with 2 x 10GE and 4 x USB Type-C and a sound card, where do I sign?
@@Traumatree That is right up until you put a 7950x in eco mode at which point it absolutely stomps the 5950x in efficiency.
@@PineyJustice AMD played by Intel rules : cranking the power up to 11 for the last 5% of performance.
These are the videos I miss. Nice to have you back, Jim.
When I was in hospitals and recovering for months of chronic health complications I discovered your channel. I ploughed through every episode while you were still releasing regularly and it got me into industry analysis and the game of inside leaks as a community in tech.
All this to say, I'm always grateful to see a new video and I seriously appreciate how much work you put into build this labour of love of a channel. I'm glad to hear you here and there even if not like before, I love these long form editorials so much.
Take care Jim, and thank you for everything.
-Kris
Sorry to hear that, hope you're feeling better. You might like to check out Moore's Law Is Dead especially the Broken Silicon podcasts here on YT. God bless :)
Alot of ways AMD can go. They can go chiplets using Zen 4 cores as the e cores and using the higher ipc Zen 5 cores as the P cores. They could add 3D V-cache from the start and then add multiple stacks for higher end problems. They also could add HBM to their products reducing the Latency even further. Alot of ways to go.
The only issue with adding more stacks of V-Cache on top of the chiplets is the heat dissipation issues, hence why the frequency is lower on X3D chips versus non X3D chips, more layers, lower clocks, and loads of games and applications prefer frequency over cache size. There is another option however that is being used for the MI300 APU, cache on the bottom with the compute dies on top, that resolves the heat dissipation issue but requires more complexity to allow voltage to travel through the cache dies to the compute dies, which is possible hence the MI300 APU but will be more expensive, at least to begin with.
Zen 4c, irrc, is half the die area of zen 4
@@Soutar3DG The strategy would be to go wider with parts of the CPUs running in sweet spot frequencies (like laptop or server), the asymetric approach means the fast logic can be non-stacked. I could imagine a basic 4 core block in an L4 cache chiplet so the scheduler can put whole CCXs to sleep when load is light, while the high performance CCDs are tuned either for many core/light cache or fewer core and full fat caches.
Try setting the base clock on an X3D down a little and turning off boost, it turns in respectable performance in benchmarks at laptop like CPU power levels.
According to Robert Hallock of AMD, it was voltage limitations which required a small cut to max boost frequency the 5800x3D of mine manages 4.55GHz peak on at least 4 of the cores at times, not much off the 4.7GHz without V-cache which is reached for very little of the time. The thermals are actually better in practice than my carefully tuned & undervolted 5600x with the original steppingj; because going out to DRAM costs power, no V-cache and 2 fewer cores with lower power density. The Zen4 & Raptor Lake launch reviews showed the diminishing returns of "speed demon" BIOS settings, you can reduce the TDP plenty with a very slight drop in performance. These long running frequency preferring programs are doing what exactly? Because a game losing a couple of fps isn't serious, if you're in fast frequency trading algorithmns then reducing latency is key, most high boosting relies on gaining thermal head room with lower utilisation. All the long running slow jobs I had for the computer, are best resolved by starting them and then doing something else rather than stare at an hour glass icon.
It's all too evident that high frequency comes with very high power use, halving TDP is just losing a few % in reviews; so the problem isn't "prefer frequency" what you really mean is fequency * IPC. That's why pipelined micro ops are used so a sequence of instructions can run simultaneously, running slower but with more IPC has been delivering performance while clocks crept upwards more slowly.
Caches have never increased performance until your code spills out of them, it's the same with games and VRAM, when you exceed the capacity suddenly performance tanks.
@@Soutar3DG The reason for lower frequency is that the V-cache chiplets can't take that high of a voltage and you need to drop the clocks to drop the voltage. There is a cooling factor too which you can see, since all the zen4X3D CPUs have like a 5-7*c lower max Temp rating, but the main issue with them is the voltage. If the cooling was the issue, they would let the cores use as much voltage as they want, since CPUs measure Core temp anyway, so all it would do is act as if you had a worse cooler. Peak boost clockspeeds would still be high.
Question is if they could increase the GPU cores in the IO chiplet and add MCD chiplets as well to speed that up
The king is back !
Hey Jim, it's so nice to hear your voice again.
Interesting as always! Days become special with a new video from Jim. We appreciate that you took the time for another video! 😊
I suspect the reason to not go double VCache was the extra latency from Die to Die killing plenty of the benefits.
I presume the whitelist Die toggling via the XBox Gamebar implies there was currently no simple, elegant solution.
What if they go crazy and put one on the bottom and one on top?
Not only that but due to V-cache being temperature sensitive, they wouldn't be able to push the power envelope of the die near what the non-VC die can handle.
Extra die to die latency still exists on the non vcache parts, so it's not like if you have a 9+core app on the 7950X it won't suffer the same fate. For games it's a non issue, few games will spin up that many threads and even those that do, are only getting small utilization/benefits, usually not worth it to cross the barrier. Limiting to only 8 cores with the vcache benefits usually dwarfs the loss of those few extra threads. I'd say the clock loss might be higher up on the priority list. Otherwise there is no *additional* downside adding vcache to both sides, vs a chip that had say no vcache at all.
Regarding the asymmetric layout of the 7950X3D: I have a 5950X for work and play, and I already launch games with a script that locks them to my best CCD. I run Linux exclusively, but you can do the same on Windows. As you showed in the CCD-to-CCD latency chart (and as documented by every review benchmark) the built in scheduling on Windows and Linux are still not perfectly tuned to AMDs approach, but to fix it manually is absolutely trivial for us nerds.
So: Yes, I'm going to swallow the 7950X3D hook, line and sinker. The only difference I have to implement is testing which games like frequency over cache and lock them to the relevant CCD. I already have three 1440p 27" monitors and the visual upgrade of 4K is just not worth it, especially not in games that have a good ghosting free TAA implementation. I can borderline not tell the difference unless we're talking bigger monitors. For that reason I'm going to pair the 7950X3D with a 7900XTX which runs RT at 1440p at perfectly acceptable frame rates even without upscaling (at 1440p the 7900XTX has about the same RT performance as the 3090ti).
And this arrives me at a system that does what I want: If I want a compile job to finish fast I give it both CCDs and go grab a snack. If I want to game while it's compiling the game gets the X3D CCD and the compile job gets the vanilla CCD. This kind of CCD allocation works fine already on the 5950X, but soon I'll be getting the gaming performance of the 7800X3D while "a 7700X with better clocks" handles the compile jobs.
So from my perspective AMD did the right thing. Many things don't benefit from the extra cache, and would rather not have the frequency loss associated with the "warm blanket of V-Cache", so to me it actually IS the best of both worlds. The usability experience just happen to be a bit questionable, but that's life with AMD.
Well put!
Yes, the flexibility really does seem nice/best of both worlds. I wonder if it even is a solvable problem for the scheduler to solve "automagically", without doing the same thing you have to, investigate what CCD you want your app on, and assign them via script. Even Intels more sophisticated "thread director" on windows 11, still can run into situations where the end result is not what the user wanted.
How did you script it to load games on specific cores in Linux and can that benefit with older intel CPUs? I haven't even thought about it but want to try it out.
@@tacticalcenter8658 read the manpage for taskset (part of util-linux) and everything will instantly make sense. There is no point in doing it on homogenous core layouts like pre- e-cores and pre- CCDs.
Always happy to hear from you Jim! No matter how much time in between. Just take care of you and I'll take what you can give. Thanks
Ah! I missed that so much! Glad to have you back here!
Zen 2 EPYC's had TSV's so that is when they had idea and possibility. But they had to work out the kinks for mass production.
Morning Jim! Great to hear from you!
Look, Adored have come to see us.
I'm so, soo happy to see you making another one
I guess its not the 5 Bucks for a second V-Cache die , its the lost of the 5,7 GHz Turbo Boost on the non V-Cache Die , thats 500 MHz higher than on the V-Cache Die and make a noticeable Difference in those apps which run primarly on one Core like Photoshop . The 7950X3D is not a Gamer CPU , its a CPU meant for those who want 16 Cores for Apps a n d the 3D V-Cache Boost in Gaming . Would an second 3D V-Cache Die deliever another 5-10 % of Gaming Performance , AMD would have done it .
I like Jim and his videos, but to me his take on 7950X3D seems pretty misguided. I'm confident the 7950X3D we got is a far better product than a dual V-cache variant for most situations if they can just nail the scheduling or give more control of that to the users (e.g. Factorio not using the V-cache CCD, or CSGO not using the frequency CCD)
@@maxbirdsey7808 Right. Hardware Unboxed did a great analysis of whether certain results are a scheduling issue or not by disabling the non-V-cache die entirely for part of their testing. Night-and-day difference in Factorio.
And just about every review I've seen shows that like with the 5800X3D, Zen 4 X3D chiplets suck at productivity. The hybrid approach really was the best of both worlds for the 7950X3D, as you lose less productivity performance (something Jim claims to care about, and should as a creator) than you would with two V-cache dies, while having noticeable gains in gaming.
@@benjaminoechsli1941Most reviews only check a limited variety of productivity tasks though. There's applications that do benefit from the extra cache.
Current CPUs are fast enough that no GPU can saturate it. No gamer will feel a 500mhz drop.
@@YlkevanSpankeren not in the real world where you use at least 1080p , but in CS Go there will be a Difference ,lets say 1200 FPS instead of 1100 @720 p
Besides : with my 5800X3D there was a noticeable Boost vs the 3900X in Age of Odessy it gone up to 135 - 140 FPS instead of 105 -110 in 1440p in the Benchmark , the 3900x could not saturate my 6800 XT
Thanks!
I wonder if the consumer CPUs will ever get 4 channel memory, that seems to be the bottleneck for many workloads, not just the latency
Isnt quad threading on the agenda? Im sure i heard somewhere it was which may possibly add that possibilty?
@@milkonbean Xeon Phi processors have quad threading, but I doubt it'll be on consumer chips any time soon.
The 10900k had quad channel memory
I had i7-3820 with quad channel memory... This was Intel's E platform, which they ditched. Threadripper is also quad channel and can be called consumer CPU
Anything ddr5 will be quad channel when used in pairs cuz each stick is 2 channels
Good to hear from you Sir!
Ahhh yes, there is nothing like hitting the Jim on a morning. Blessings bro! Nice hearing "alright guys". Lol.
I do wonder if AMD would make an L3/L4 cache by stacking the 3D cache chip across the core chiplets (so one cache chip would be connected to both core chips)
They're sort of doing that with the MI300 APU, the cache is the bottom layer with the compute dies on top, it resolves the heat dissipation issue so clocks can still be higher but requires more complexity to get the voltage for the compute dies to go through the cache dies, so I can imagine that solution will make it's way to desktop chips once the cost comes down.
@@Soutar3DG Yeah, I was excited when that thing came out. They've been working on this fusion concept since, what, 2010? 2008? And it's finally coming into reality.
@@benjaminoechsli1941 no one wanted to believe it, but I always did think that was the future. It's becoming more apparent now.
OMG , welcome back! We missed you Jim!
Great work Jimbo!
Really hard to know who is at fault - if the foundries or the design teams. It always seemed to be the former, but lately more and more sources claim the foundries are doing better than expected and it's the design teams at Intel messing up.
I don't think AMD themselves have decided many of the things you mentioned. Either way, I am looking forward for the results. I would love a desktop chip with Zen 4c cores only, or at least in a year or 2, Zen 5c chips only. I hope the efficiency gains will be enough for productivity that these non-gaming cores will be a thing even for consumers, not only server.
Yeah, if the design teams were used to the foundries being able to "bend" their process to help fix some of their issues, I could see that being a problem now that nodes are shrunk so much that the very laws of physics means the foundries arent' able to be as flexible. If you are used to working with a third party, eg TSMC, then you know you can only count on them so much and have to do the rest yourself.
Zen 1 ended Intel's gaming dominance because it became the talking point of every gamer in the world. Benchmarks and all that doesn't matter when you're objectively making a better product that uses less power, costs less money, and has a viable path forward in a platform. For idiots who still think Intel is better, well it's a shame to have 0 IQ init?
There are always people who only look at shallow statistics in every market. And either don't care or simply do not think about longer term things or implications or are just not very good at cost bennefit conciderations.
The smart money started flowing towards amd in this case for sure. In the grant scheme of things most of these benchmarks simply do not paint a realistic or even useful picture.
How many people game on dedicated constantly fresh windows instals and How many people buy only highest performance parts for an entire build?
Add any applications running extra and it influences performance and generally will shift it to higher core count cpu's. Skimp a few 100 bucks on the gpu and don't buy the absolute fastest available and a 5 or 10% performance lead for one cpu or another can basicly evaporate completely.
I have been hoping someone comes along and starts a channel dedicated to benching more realistic systems.
I see nobody testing which cpu would be the best combination with a 3060 gpu for example. How much cpu power do you need to match a certain gpu.
Or what new gen gpu upgrades make sense for 2 or 3 year old higher end cpu. That kind of stuff.
I think this would be much more useful information for consumers.
When Intel can't compete on performance, it competes on price, and there will always be a market for cheaper parts. Nothing wrong with that.
It's very possible the stacking/bonding process currently bottlenecks production while process throughput is being scaled up. In other words, increased cost due to limited capacity on this specialized process, rather than material/silicon costs. This would explain some of the odd choices in both configuration and marketing speak. Just a guess though.
Being careful while learning the process is probably pragmatic too, once they have it well understood and fine tuned they can go ham with insane stacking arrangements and monstrous CPU / GPU units.
I'm happy with my Ryzen 2600x and Radeon RX 5700XT right now, but when I'm ready to upgrade again there are going to be some crazy options on the table...
I've been waiting for someone to drop some possible ZEN 5 details. Jim comes through once again!
I must say that IBM PC standard needs a massive overhaul in itself. It's been 40-50 years and it served us well but now we are reaching the limits of the platform.
Glad to have you back Jim
The ability to park to unused chiplet is such a huge advantage, not just in terms of power draw, but performance as well. The chiplet without V-cache can clock higher than the other chiplet, making it better at many applications. So you get a choice between V-cache or higher clock speeds. Besides, most games don't need more than 8 cores. So there isn't much of a penalty to being limited to one chiplet.
Right. Creativity programs love clock speed more than cache, do a hybrid approach like this gives you the best of both worlds, like AMD said.
If it's done automagically then sure, but it does offer a quirky value proposition if you're buying a CPU where you get half the cores for gaming but all for productivity... (Although Intel's small cores are basically that same value preposition)
Performance benefit of dual v-cache dies? Depends on your workload. One thing I noticed in your screenshots (for example at 9:44) is that the non-vCache cores are running a full 1000 MHz faster than the v-cache dies (with the exception of single-core 0. So if you need high clock speeds for some workloads, the non v-cache actually does give you a performance boost. For pure gaming workloads, the 7950X3D is a waste of money in the first place -- wait a couple more months for the 7800X3D. We have to consider the benefit of the higher clocks on the non-gaming workloads when we talk about overall performance of the 7950X3D.
I feel they are just holding dual cache stacking for if intel comes back with something so they have an easy win by adding stacks on both for next gen chips.
Perhaps it's better wait a bit longer for more precise information about Zen 5 don't you think?
Great video, always interesting to hear your pov!
I’ve been waiting to switch to AMD but I want a proper workstation platform with motherboards that aren’t obscenely expensive
Great video. Thanks. Subbed.
I've watched for the last 25 years as "AMD set to end Intel's dominance" headlines have come and gone. That includes people telling me "this time is different" so don't bother.
its as predictable and tiresome as "is this the year of the linux desktop?" headlines at this point.
Seems a bit hyperbolic. Times is tough for tech, Intel didn't get to today by not being able to compete.
Nice. Your last video talking about stagnation in RTX was quite trash but this is nice and back to form for you.
Time will tell.
22:40 but this will also increase core to core on the same chiplet latency, because there is a larger structure to navigate, just like cache latency increases with size? This will cause a drop in light to medium threaded total IPC? V-cache would of cause hide this, along with other improvements as well. I guess we can only speculate.
Eager for part 2! Thank you for this!
Great insight in your video as always! Welcome back :)
Just showing my love, great video.
Isn't wired stacking effectively a semi-emperimental tech still, with low orders volume? If so, economy of scale principle: low orders -> expensive manufacturing. Then die bonding might be pretty pricey right now.
Yeah, Zen 3 X3D was basically a working prototype, with Zen 4 X3D being v.1.0. Looking forward to seeing if Zen 5 V-cache solves the voltage issue so we can run V-cache chiplets at closer to 6 GHz.
In theory that is true. AMD is past the inflection point where costs come down dramatically though. They’re doing tens of millions of chips annually (including Epyc) with vCache already, and the packaging process is all automated because of the technical requirements. The cost isn’t really going to come down significantly from what it already is. Now the price they’re paying TSMC and the margins they’re capturing is a whole different discussion. I strongly suspect AMD is paying a pretty penny and TSMC is just using that pricing power to increase their own margins. That equation won’t change with volume, and it will only change with time it TSMC has some actual competition.
Before I even start, isn't Intel already on the ropes on the desktop?
It's for servers that the true Titan wrestling is happening...
>2 dual cores glued together.
Intel was still pissed at similar remark about core duo, I think almost a decade later.
Their first glued-together chip was the Pentium D, which was two Pentium 4 dies placed on the same package. No communication between dies except through the front-side bus, which means it was actually SMP on a single socket.
The Core 2 Duo was a proper dual-core chip. But their Core 2 Quad chips were two Core 2 Duo dies on a single package, also effectively SMP on a single socket.
Calling those "glued together" is appropriate, given the lack of communication on the package.
AMD's multi-die chips, however, had very high-speed communication on the package right from the start. So Intel's attempted repeating of the "glued together" taunt they had received 10+ years earlier fell flat.
Like AMD, Intel proved it is an innovation powerhouse
I'm saving this for a bit of quiet time !
love seeing another one of your insugtful videos, I come here just for them. please keep making them. thanks!
2 V- cache CCDs would be slower than one in gaming because of between chip latency. You can see this with the 7700X beating the 7950X in many games, even with a lower max frequency and power limits.
What a great Friday!
I'd like to address the 7950X3D claims that games would benefit having 2 v-cache ccds - from what i know vcache ccd has much lower clock speeds and this reduces perf. on some games even with the extra cache.
So the reasoning for 1 normal ccd and 1 vcache is that the game will use the one that brings it bigger perf. Essentially, 2vcache ccds would give lower perf. in total than 1/1, its not because of money
When calculating the cost of adding the extra cache to both dies of the 7950x3d, you forgot to consider the opportunity cost. Securing wafer orders at TSMC isn't something to be done lightly. They may have been constrained by how many wafers they had ordered. They can sell a lot more 7950x3d chips if they cut the vcache in half if those 7nm wafers are the limiting factor.
7nm production is down if I recall the recent news, meaning that theoretically there should be lots of excess capacity available. So i'm not sure if 7nm volume was a limiting factor here.
So good to hear from you. I would love to hear you commenting on the videos from coreteks - it seems like he is very angry at AMD.....
Very angry seems to be an understatement lol.
Not sure why though. Maybe something personal.
I don't understand why you say that 3D cache on both 7950X3D would be better? First of all, 3D cache only on one die is cheaper. Second of all, now you have better single-core performance since the non-3D cache die clocks higher, which is more important for some applications. Sure, there are some scenarios where it chooses the wrong die like in the Factorio benchmark, but that should hopefully get fixed.
Well put. Jim is a creator, and so productivity performance matters to him. He said so himself in this video. Why does he want V-cache on both chiplets when that would kill productivity?
About time, innit? Thought it would happen sooner
The legend has returned. And AdoredTV is back as well....
I'll agree with that, when i see it myself. Not before.
I chose intel as my computer never turns off and that low idle power draw is a must. But I am worried about future intel chips using tiles would increase their idle power draw. AMD have high idle power draw due to their chiplet design.
I welcome a small little board with cpu socket and nvme slot, and pci slot, that you can just plug onto your freestanding desktop gpu, in a few years.
The 7900x3d and 7950x3d are actually parking the non 3d cache die so when gaming it actually is an 8 core with extra cache. Issue is that it relies on the OSs thread management so if it doesn't detect a game properly the non 3d cahce chiplet isn't parked and thread might end up on it and the latency of having threds spred out over to chiplets actually taxes performance a bit. The 7800x3d will realistically math or even beat top 3d chips because it has 1 chiplet with 3d memory so it will avoid any wacky performance hit and instability due to the basic configuration that doesn't really on the thread scheduler
One thing to keep in mind with the evolution of 10nm intel parts is that the desktop parts have been progressively optimised more and more for clock speeds at every level. From the different layout of the cores to the in silicon timings they are built entirely around speed. If you take Sapphire Rapids and just clock up a single core, you don't get nearly as high as even Alder Lake despite it being the "same" Golden Cove core... Those server dies are just not pushed as far towards that end of the scale. Furthermore the whole 10nm process has been re-specced to lower densities to allow higher clocks and more manufacturability...
Jim you should do a video on Apple's silicon. I've always wondered if they can ever scale it up or if it's doomed to be constrained to mobile hardware.
Isnt it on desktops as well?
@@stephenxs8354 the biggest thing they've put apple silicon on is the mac studio. Which is another small form factor pc.
Apple stuff is just constrained by the walled garden. The desktop socs are highly promising, but no one really bothers to make games on apple desktop silicon
Awesome video as always Adored Team! Is it manufacturing issues Intel have now or is it chip design or both?
Laptop business is more important as few people buy desktops. Then data centres. VCache apart from gaming is less useful elsewhere.
I think (though I am not sure) that you can't have 2 seperate caches at the same level and expect extra performance. Programs simply request memory and are not aware of where the data is. If the data is on the wrong cache chip (maybe half the time?) they not only have to check that it's local causing a miss, then have to go over to the other chip which is far away. It's like the cache is at a different level if it's not on the bonded cache.
If there is only one chip, there is no extra penalty for having to check if the data is local or on the other die. We know running games on the die that doesn't have the chip nearby suffers a performance hit and you should only run games on the first 8 cores as giving it the extra cores with the slower cache access does nothing.
To acheieve good performance the connectivity to the cache needs to be improved. Much like sticking a second GPU into your computer you don't double the available video ram.
Intel have Foveros for 3d stacking and Power via coming so I think its unfair to say they're not innovating in the same space as AMD...
Great video Jim!
One thing I always go back to with Intel is that they’re still competing with AMD using inferior transistor density. Intel is essentially competing against Zen 4 using the equivalent of Zen 2 manufacturing technology. Given the handicap, they’re actually doing quite well. The fact that they’re even in the conversation in terms of absolute performance speaks to the quality of their processor designs, at least on desktop. (Mobile looses due to inferior efficiency, and server looses due to inferior transistor density prohibiting higher core count parts).
If Intel ever gets their act together and catches up to TSMC in manufacturing, it could get VERY interesting. But it seems like either their design team or their manufacturing team drops the ball every generation without fail.
There is more to it then density. Especially if your manufacturing practices mean you can not capitalise on density with clockspeed.
Honestly I feel like I have more of an idea of when Zen6 will be like then Zen5. I've no real idea what the cache arrangement will be like for Zen5, or if they will actually be switching to 16 core CCDs (remember RedGamingTech also claimed that Zen5 CPUs would have Zen4c cores on them, which at the very least we now know is wrong).
By Zen6 it seems almost certain to me that they will use 16 core CCDs, and likely they'll move all the L3 cache off die, since 2N will be an obscenely expensive process, it won't benefit SRAM scaling at all, and dedicated SRAM chips can apparently get around doubly density.
Jim never says how much Zen 5 has in terms of core counts, just guessing 12 or 16 core CCXs (I hope it's at least 12 because a 12-16 Core CCX w/ V-Cache would be amazing).
Alright guys! Blessed words!
I suppose it all comes down to granite rapids. That'll be the bellwether for if they can turn the ship around. I'm almost rooting for them. power efficiency matters to the more important sectors (mobile and server) more than desktop, so that's a worthy trade off. Funnily enough it might be their design team, if anything, that lets them down there.
" power efficiency matters to the more important sectors (mobile and server) more than desktop" - speak for yourself there bud. I live in a hot and humid climate and the latest Intel chips can turn my house into an over damn near at idle. I just got a 5800X3D to replace my 3600 non-X and under volted it to a flat 1 volt. The thing never sees above 45*C when playing any game and idles between 26*C and 32*C (based on room temp). The hottest I've been able to get it was 61*C on a 3 hour Aida64 test, once you under volt the X3D chips they run cool. Linux just dropped a video with a 5000watt chiller on a 1300k and still got the chip up into the high 70*C. The upper Intel chips can't even be cold by high end air coolers when put under a steady load.
@@TdrSld That's because the CURRENT intel chips are ludicrously inefficient (10nm vs 5nm lol) whereas the X3D line of chips is ludicrously and awesomely efficient, even compared to the rest of zen (Which is ALREADY more efficient).
What I am saying is, that if intel were to take a page out of AMD's book when it comes to efficiency... Including using a more efficient node (although in this case they would sacrifice clock speed, unlike AMD) it would be a good thing. I actually agree with you.
Intel does not CURRENTLY have good efficiency, but will get it with the intel 4 node, is what I am saying.
@@abowden556 Agreed
fantastic video.
Now, if AMD could just make a 16-core/32T SoC with 64GB RAM and a RNDN2/3 GPU that could be put on a tiny motherboard, it would sell like hot cakes as the performance would be unearthly.
Be interesting if they stick to 8c on the console going forward. If they do. Probably will just add more and better ram to them.
@@onomatopoeia162003 Thats not so much an AMD thing, if Microsoft or Sony wanted them to make a more expensive, 16 core APU amd would gladly sell those to them. The extra price just isn't worth it for the consoles, since they want to sell those as cheap as they can, most of their profits is actually selling games.
Those kinds of products are probably a lot closer then people think.
Look at the MI300 they recently unveiled. They're trying to get there.
I don't believe they can get both IPC increase and doubling cores per ccd at the same time, they'll have to do one or the other, all that i know is Zen 5 is a going for a wider design, and even if they manage to get 16 cores on a CCD, it must be unified and not 2 8 cores CCXs (12 cores CCX might not be out of the window tho), so i still think they'll keep the 8 cores design for now, what i am looking for tho is cross-communication between the 2 CCDs via a bridge instead of passing data throught the infinity fabric which would greatly reduce the latencies penatly when jumping to another CCD, ideally it would also mean that both CCDs would have access to each others L3 and make a sort of "virtual L4" a la IBM, add v-cache on top of that and you're looking at something pretty neat. That what i think Zen 5 will provide and to me is the only biggest weakness they have left with the chiplet design, they could also greatly improve the infinity fabric capabilities but i doubt it will help reducing latencies by a significant margin.
I might be pretty conservative but i think that what's most plausible about what we can expect from Zen 5. I would be very surprised if they managed to cram more cores per CCDs.
Too Bad the RDNA 3 video died. I was looking forward to that after you talked about it on broken silicon
Thanks Jim, excellent analysis as usual and well worth the wait as always.
Love your analysis and soo happy your back. :)
We love you, Jim! 😃👍
I want intel to get it together. I think they'll be ok when the fabs are done. We need more competition more companies in the space. Not less. If they keep bleeding money you better believe AMD will bleed us dry if they the only ones in the space.
Well, I disagree, I think they're both doing it now.
Looking back, how much money did these CPU tech companies get through the taxpayers stimulus package?
I myself don't play these games if I give you hard cash for R&D I expect a lower price in exchange for purchase.
@@AwesomeBlackDude yeah your right
Your videos are a real treat.
Hey Jim ... regarding 3D-Vcache and the 7950X you overlooked a couple of things and regarding Zen 5 you actually missed something entirely:
- No game actually NEEDs more than 8 cores, so 3DVcache on a single die is all that gives you actual benefits in games
- almost no production application responds to the massive L3 cache amount but clearly reacts to the loss of clock speed on the 3DVcache die.
-> putting a cache die on both CCDs would have brought down the production app performance while gaining nothing in gaming -> it would have been a worse product
- already with Zen4 there are 2 designs: Zen4 and Zen4c ... the latter optimized to be compact in die-size but achieving lower clock speeds and maybe having less cache -> Zen4c could be seen as an "efficiency core" variation of Zen4.
- If full performance Zen5 really will achieve 22-30% higher ST performance compared to Zen4 then this WILL cost die space. There is NO WAY 16 of these cores will be packed on a single CCD ... for segmentation reasons alone this would be way too much. Even 12 cores would be too big. The area advantage of the smaller node will be eaten up by the area needed to achieve this MUCH higher IPC/ST-perf. Just look at the area used by Intels P-Cores ... yes, they pack a punch in ST but at the cost of area and power!
- I can see a Zen5-gen product with an 8-core performance Zen5 CCD and a 16 core efficiency Zen5c CCD ... the latter cores might still reach the performance of a Zen4 core ... with 16 of them the production performance would be absolutely bonkers ... on top of the 8 full-fat Zen5+3DVcache cores
My bet: Zen 5 high-end will be a 8C+16c core configuration with 3DVcache only on the full-fat-cores
Well said!
Video uploaded on my birthday.
Best birthday gift ever
Another great video. Always an entertaining (and informative listen). I'm wondering what the "Eureka" moment from 25:52 was
I think the reason AMD only put the 3D Vcache on 1 of the 2 CCDs for the 7900X3D and 7950X3D is because the non vcache cores can clock higher than the vcache cores so applications that benefit more from higher clock speed and don't need as much cache are faster on the non vcache side. The problem is task scheduling sometimes gets it wrong and sends some or all of the threads to the wrong cores.
So AMD would probably have been better off having a 7800X3D for gaming, and a 7950X3D with Vcache on BOTH CCDs for people who want to use it for gaming and productivity apps that need more than 8 cores. The 7950X (non 3D) still exists for people who run productivity apps that care more about clock speed than cache. I think the performance hit for task scheduling misses makes it not worth splitting the cache. That's the reason the simulated 7800X3D is usually faster than the 7950X3D. There are no misses because the other cores are parked.
Maybe the task scheduling issues will get worked out at some point, but the workaround of basically turning off half of your cores fairly often seems like a pretty big waste of money. If I'm going to turn off 8 cores I might as well get the 7800X3D and save a few hundred dollars. If I need more than 8 cores then I either want higher clocks on all of the cores or I want higher cache on all of them.
Would be interesting to analyze if there would be benefit to have a L4 Cache on the I/O die or as a separat chiplet on the substrate.
Either as SRam, DRam or HBM would probably have their own benefits and drawbacks but regardless that would significantly reduce the usage of RAM and therefor redice its latency penalty specially when you can have 1GB+ of L4 Cache.
I'm curious as to how much DRAM you could physically stack on the IO die. If you could put 32 GB there (stacked several layers deep?) you could eliminate the need for DRAM on the motherboard.
@@edgarrice exactly, that would kinda be the death of motherboard as we know them today, as Jim said in this video :D
They (motherboard maker) could probably turn into just a "back-plane" maker.
He’s back! Day made
28:43 What do you mean be "the end of motherboard as we know it"?
7980XE's price is a good example and then boom 10980XE's price whoopsie what happened Intel? I've still upgraded from my old 2500K to 13900K last month as I needed the performance, RAM speeds and stability which after having so many problems with AMD and RAM. I've stayed on Sandy Bridge for 12 years so upgrading to a dead platform isn't really an issue for me.
"power-perf" - perhaps performance at fixed power? - it could be that higher frequency scales much worse with tiles due to higher impedance between tiles, so low frequency looks good, but high frequency all the power goes into losses on communication between tiles.
Thank you for making more videos!! LOVE IT
Good video as usual.
Can the non vcache chiplet be limited to background tasks via software?
That's basically what the scheduler does for Intel's e-cores, so yes.
Done? Probably not ...
Behind in the competition ? Most definitely
In my opinion the chiplet to chiplet latency can be solved by properly scheduling of threads by the operating system. I believe the entire software universe should be developed ground up with awareness of 10+ cores and E cores in the stack pipeline.