Intel has done well but only in the past 4 years due to AMD's continuous innovation. Thanks to Dr. Lisa Su they have made very strategic short and long term technological and business decisions that will reap their full benefits probably within the next 4years across the board. Basically completing 10 years since ZEN/EPYC/NAVI have come on the scene.
@@chrisbullock6477lol SR drew even with two year old stuff after three years of dev and small socket stuff uses multiple times the power to do the same work as Zen 4
You must be young. My first PC had 1MB (that's megabyte) of RAM. Many CPUs these days have that much or more L2 cache *for just one core* than that! Now, I'm sure there are people who started out with IBM XT or something. More older to you, then (LOL).
@@catsspat you'd be correct, mid 20s. The first home computer we had was an XP machine from Gateway (rip). Lots of good memories learning the in's and out's of what is still basically the same thing we use today. I'm just glad we didn't get a ME machine...
Yeah that's nuts. That's 3 times the amount of memory my first computer had on the hard drive. It was some old IBM. Can't remember what model. Windows 3.1 on it.
Cities:Skylines 2 is promising to fully take advantage of all cores, stating the only limit on the simulation size will be your computer. I would love to test it on these to see just how it would compare to, say, the 7950x. If it did make a difference I could see hardcore CS:2 players becoming Epyc fans.
@@mos6581com we don't know yet - but it looks like it from the way they are talking about how the cims evaluate decisions they make for transportation, education, happiness, etc.
One of the reasons the F series sometimes pulls ahead is because of less pressure on the L3 cache, some workloads with poor cache locality will basically thrash the L3 if there are too many cores in a CCX doing it at once.
Intel’s alder lake from 2 years ago has chiplets. They use silicon interposer to create a true on chip tile design. AMD chiplet is connected through PCB which has delay penalties but it is cheaper and more easier method than intel’s tile system
@@boemlauw to be fair. AMD chiplet is on pcb board, lol. Intel chiplet is connected through silicon wafer. It a big engineering challenge to go from pcb board to silicon interposer. That is something AMD will have to overcome as the inter chip latency becomes more important
Thanks for including some single thread benches, far to many people tend to ignore those since the total system throughput have become such a vital stat these days, but for some work loads the latency is also important.
Hi Wendel, amazing presentation and performance! Just wantes to reach out maintaining the #t2sde Linux distribution for 20 years already, and it would be amazing if you had some SSH CPU hours to spare to try how epyc the thread ripping performance building a whole Linux distribution world would be! Thanks! (re-post as 1st post disappeared, ... :-)
Can confirm. René is doing the lord's work and it would be great to see René and the Level1Techs crew work on a project together for the bettering of the opensource world.
If someone from the future had told me in 2016 that amd would have a 128 core monster of a cpu just 6 years later I wouldn't have belived it. The most impressive to me is how the embrace of chiplets allowed them to maximize yield and their wafer capacity; and paired to that the way they managed to alleviate the chiplet's inherent drawbacks. (remember the first epyc, it was great in some workloads but poor in many others). This embrace of chiplets also allowed them to iterate faster and tricle down the advancements to the consumer chips too. Finally the last strokes of genius are the avx implementation (from poor to better and more consistent than intel across the product stack) and the c cores with full functionality. Trully trully impressive for a company on the brink of bankrupcy. I hope they can do the same in the gpu front.
Have some of these workloads hit a limit on their scheduling capabilities? Some of those Bergamo 9754 results seem anomalous. in how bad they are. The 9784X is a beast by all accounts.
1.1GB of cache...holy crap, my first PC had 1 MB of memory, 4x256k chips, the cache for the chip was external on the board, and I think it was 32kb for that particular board...though it could have been 16 or 64, I don't really remember 100%, and a 40 mb HDD...and honestly, I never thought I would need more, it was so overkill for that old 20 mhz 386 at the time, oh how far we've come in such a short period of time.
The vast vast majority of games hasn't been truly CPU bound for ages. Also, while multicore support is pretty much a given these days, many games can't properly utilize more than 6 or 8 physical cores.
I'm just happy AMD's datacenter success is paving the way for Zen 5 R&D. If we include the consoles then I have bought a Zen CPU of every generation except Zen 1, so far every Zen generation has seen huge uplift and I'll probably go to upgrading three generations in a row if the advancement momentum carries to Zen 5.
I think it would be nice if those lidded CPUs come with some drawing or ther type of visual indication that let us know exactly where the heat generation centers were located, for better termal paste aplication.
Now AMD has "building blocks", as in packages, for heterogenous multi-socket solutions, with 4c+4x3d for mem/cache intensive and/or core intensive NUMA nodes ;) On the other hand, working in corporate big data processing I can see benefits to 4c... where previously devs had to pull double duty by rebuilding for up to 128 cores and by replatforming for ARM ISA, now they can focus only on core usage... or they can focus on better large cache optimisation options for x3d.
Remember those HFT focused Xeons from 2009, those "2 core" monsters that ran at 4.4Ghz, but was really a 6 core with 4 cores dissabled, and then all of the cache available to the 2 enabled cores, man the X5698 was an amazing beast, especially if you could firmware unlock those cores and water cool it. Imagine they do that here, 24 cores, 1.1GB of cache, up to 92MB of cache per core if you only use one core per CCD
I can still see where some of my workload might benefit from such a chip. (More cache per processor the better for large matrix inversions, f'rinstance) Are there more recent xeons than 2009 that use such a "disable some cores to reallocate cache" strategy? Maybe a 16-core that is limited to 4 cores?
NGL, you should be part of AMD Wendell, you do a good job of marketing their server side of things. I keep looking at Genoa I currently have a 7551 Epyc for my workstation, but definitely my next rig will be Epyc again. I think my only gripe is the lack of normal IO on server mobo's a lot of people prefer Threadripper but I honestly love having IPMI which. Asus rack is my favourite though with no license cost for the IPMI unlike supermicro. Machine learning right now is kinda hard to get into fully, I run a Xilinx Alveo u250 FPGA accelerator for NN's but they're not super convenient to accelerate all types of models, or even port to. I hope that AMD is working on bridging the gap though with their new cards, I am a little disappointed that the Xilinx/AMD FPGA side of things has paywalls for some of their accelerators, as a dev I need to use these cards to actually learn to go into production environments, and don't want to buy the potato dev boards they have, I'd rather have what the company is going to deploy. For AMD to have a full edge on NV cards I think we need: - FPGA portion (which can be locked and changed in groups of cell's) - A plug and play solution to just using these accelerators with PyTorch/Llama models without converting to proprietary Xilinx stuff. AMD/GPU creators: - please, make GPU/accelerator cards memory up-gradable again, with AI we need to be able to put more than 256GB of vram, without having to buy another card when our model gets too large... I want to run more than one network at once...
I would like to see EDA benchmarks . Specifically between these AMD beast and NVIDIA's CuLitho. Although I see how difficult that would be when attempting to compare apples to apples. Still it would be fun
I want to see this for sort of weird Digital Humanities projects -- stereophotogrammetry (although it'd need a good graphics card too), machine learning of old scribal hands, stuff like that.
I wonder about the thermal solution and the chip temperature in this situation. The Vcache chiplets have notable worse heat dissipation than the regular CCDs on the consumer platform. No thermal tricks? Just a drop in replacement under the same coolers? With 400W in that density, it seems wild to me...
There is no 9784X, as the 7 stands for 128 cores. You mean the 9684X right? Would be nice to have this clarified in the description or in a pinned comment
Sadly I don't remeber where I saw it, but recently I came across some kind of compatibility list or spec sheet or something, that mentioned a 128 core CPU. Then it said AMD somewhere and I was like "LOL someone must've mixed that up with an ARM CPU, EPYC only goes to 96 cores!" Guess it wasn't a mix-up after all.
Sure as FUCK wasn't! ;) God bless AMD for keeping the pressure on the market - no CPU-manufacturer can sleep well as long as they're still executing - keep lighting that fire!
Once all the program code fits in the cache you don't stall the data flow from main memory with program code. The "stalls" are what kills throughput Programmers need to write tight code to get down to this level though
It's nice that they're increasing the number of cores per CCD. On no-so-old 24-core 74F3 (zen3) it was limited to 3 cores per CCD (the techno would allow up to 4 max for a 32-core), and the inter-CCD latency was absolutely horrible, making you regret not passing the data via a floppy disk instead, making it impractical to use for applications using more than 8 threads. On zen4 they increased the limit to 8 cores, which is significantly better for many workloads by allowing applications to run up to 16 threads with great performance. So if they raised that to 16c/32t with zen4c, that's even better and an awesome news, it will unleash all that power that was not usable before outside of clouds where you sell CPUs one at a time.
Instinct MI300 already got Zen core ... So we may get something in reverse, FPGA, HBM2 or CDNA chiplets may placed on CPU package someday (if market demand it), I think the most beneficial will be embedded appliance/accelerators. One chip rules all.
Nvidia’s H100 GPU & Grace CPU is a poor attempt to bring the CPU to the GPU card. H100 has 512G of DDR5 & 128G of HBM3 for CPU. AMD’s MI300A is a better design.
@@NKG416 In essence, yes - a chiplet-based SoC, instead of a monolithic SoC. Technically though, almost every CPU is an SoC to some extent, these days.
Son, you can store that shit on a USB stick in the directory BEFORE and encode it on another dir called ENCODED all on the same USB stick, got to put those numbers up :)
About Genoa 16-core three-quarter gig cache chip, is that hard wire 48MB per core? Or if I'm not using all the cores does it dynamically allocate more cache per core? (i.e. if I specifically write my code so that I'm only using 4 of the 16 cores, do i get 192 MB per core?)-
@Level1Tech, do you know if the Linux kernel and the tool chain have had CPU specific optimizations added to them yet for these new CPUs? If not, I'm curious how much difference that would make.
Kernel itself nothing more or new specific for these chips, the improvements would come from better NUMA arrangements for respective use-cases, but there are apps in server space in data/big data/compute/client serving that could be configured for more cache used to improve in memory performance or for more in number but slightly less heavy concurrent processing pipelines per socket/blade/rack...
The Genoa and GenoaX graphs are labeled as 2P. Did you limit the test to 96 cores on those systems, or are those running with 2x 96 cores vs 128 on Bergamo? Will there be dual Bergamo boards too, since they are basically drop in compatible with Genoa?
8:52 Symmetric or Simultaneous? I know there's Simultaneous Multithreading (SMT) and Symmetric Multiprocessing (SMP), but what's Symmetric Multithreading?
Well, unfortunately there are several reasons for that, number one being the fact that the T.A.M. on enterprise x86 is much larger than that of Consumer dGPUs. Also, enterprise customers ACTUALLY buy items on empirical data, whereas consumers, like consumer psychology has shown us for a long time, buy items for a host of irrational reasons, and this is what makes Nvidia's monopoly so hard to break. AMD could objectively release better GPUs that offer more performance for the dollar and people would STILL keep buying Nvidia just do to perception and the parasocial relationship Nvidia fans have formed with the brand. That and Nvidia spends an amount on GPU R&D that AMD spends on both x86 and GPU R&D and then AMD still have to compete with Intel at the same exact time and Intel is spending over 3x as much as AMD on R&D....in other words, AMD is competing against two rivals at the same time and both of them have access to significantly more financial resources, which makes it all that more impressive that AMD's able to do as well as it does, but also explains the limitations of the resources AMD has to compete against Nvidia.
The money is in data center CPUs and laptop CPUs. AMD almost went bankrupt prior to Zen. Add that general consumers will buy Nvidia even if AMD produces a superior GPU, and you start to see why AMD focuses on CPUs.
Last gen AMD gpus were superior, the radeon 6000 gpus were more power efficient and had often superior performance to nvidia, yet people still bought nvidia. This generation AMD is faster and has more vram for less price at every pricepoint yet people are talking like you "I wish they did the same great job with gpus"
@@SomeUserNameBlahBlah AMD is superior for GPUs at every price point this generation aside from the 4090, and power efficiency isn't a real argument considering how widely ignored it was last gen when nvidia was losing badly at it. A 7900xt is 100$ less than a 4070ti, nearly 10% faster and has 20gb of vram vs 12gb, does it really matter it uses 70w more power?
I've said this for a while about AMD's io die stategy. If they really wanted to, there is nothing stopping them from putting a zen 3 compute die onto an AM5 package perhaps for a low cost am5 cpu or going the opposite direction and putting a Zen 4 compute die onto an AM4 package.
I would have to wonder how Zen 4 would be impacted by the relative lack of memory bandwidth on DDR4, if a theoretical Zen 4-AM4 CPU were to exist. Not to mention how exactly it would interface with, say, a B550/X570 chipset.
while it would work, each core is designed for the IO die it's supposed to be paired with. AMD themselves said they can't increase the core counts (more than 16 cores in consumer desktop) unless the memory bandwidth increases too, since the cores would be data starved. They could do that but the performance would suffer
@@VideogamesAsArt Years ago Intel said it would not do more than 2 cores per memory channel. So dual channel, 4 core is what we got. We now have the 13900k with 24 cores total. So I'm not really inclined to believe whatever either company says.
They absolutely are concerned - Intel has continued to lose market-share in Server and HPC for almost 5 years now. Now that they've had to do some down-scaling in order to cut costs, you can bet they are more concerned than ever. If the trend continues, which it honestly appears to be, then the two companies might end up being about the same size.
Good to hear, Word was getting a big sluggish. This might just fix it. Joking aside. THAT is some real powerful CPU. I am sure some webhosts and other hosting services will LOVE this one. Not to mention businesses doing important calculations QUICKLY and so on. Scientists will drool at the idea of such power, the amount of complicated models they can throw at that!
I'm sorry if this is a dumb question but what does he mean with "The chiplets are behind the I/O Die" at 04:45? In the official slide from AMD shown at shown at 05:05 they're very clearly not behind the I/O Die..as the corresponding 8x16 chiplets constituting 128 cores are located on top of the substrate at the sides of the I/O Die where they've always placed them. I understand that it's me that's missing something but I'm curious to find out what exactly. So please, pretty please 🥺🙏 I'd really appreciate if anyone can help clarify what kind of chiplets are they employ besides the cores on the Bergamo CPUs, that Wendell is talking about that's placed under the I/O Die?🙏 ✌🖖
Its not about the physical location, its about interface. The IO die is responsible for communicating between the CPU and the rest of the computer, in that sense the chiplets are behind the IO die, think of them as standing behind a door, that is the IO die, and the rest of the computer is on the other side.
@@steffennilsen2132 😆OH!…that..🥴..that just makes A LOT more sense 😁. I for some reason just got stuck parsing it way too literally. Thx 🙏 appreciate it! ✌️🖖
I find myself wondering if Zen4C or it's successors might find a home in an asymmetrical setup like the 7900X3D and 7950X3D, eg. one 8 core chiplet with 3D vcache and one 16 core Zen4C chiplet with less cache but either the same or higher clocks? (I'm aware that we don't have data on how Zen4C would clock in that situation) Making the Performance and Efficiency cores be the same fundamental core design with different cache structures intrigues as a notion?
This is so cool! I want to see this be taken to the extreme some day. Maybe a bit-little core design. Put 2 of those 16 core chiplets on a die, together with 10 hypothetical 64 core (small weaker cores) chiplets and bam: 672 core server. Bye competition, bye gpus!
In the next gen desktop 3d chips it would make sense to have a 3d v-cache die and a zen4c die. Specially if they improve the infinity fabrick so the 16 core chiplet can derive some benefit from the other die cache. Won't help in games but it could be both a gaming and productivity beast. I also think that they should put some L4 over the IO die.
Any chance you can run a COMSOL benchmark of some kind? I do a lot of COMSOL simulations and they all run on a per core basis and not per thread. I currently have a dual socket 9554 EPYC system with 1.8TB of RAM.
Hey Wendell, I am following you for a while now.. I do have a interesting workload for you, maybe. We have deployed a vsan cluster with 100GiG wires speed, however our Intel cpus struggle verry hard to go anywhere close to wire. They top out at about 40gig/s. This is already running multiple iperf instances and jumbo frames. I wonder if those amd cpus can handle network IO any better?
OK Seriously We need another Level1Tech and Linus Media collab here...... Dual or quad socket Epyc 128Core Vcache monster VM gaming LAN PARTY...... Then a seperate video Inviting bunch of either streamers or pc content creators to duke it out..... !!!!!! WHO WOULD LOVE TO SEE THIS. HECK THEY CAN EVEN DO IT IN A LIVE STREAM FOR BOTH>......
Can you test the 1gb 3dvcache cpu in simulation heavy games vs 7800x3d, like lategame stellaris or factorio. And other cpu heavy titles like 1080p cyberpunk with rt on 4090
STOP IT Wendell! I JUST FINALLY got an E5 2690 V4 and now you tell me bout this thing so I feel poor again just when I was feeling like a king :( :( :( Unrelated note - if you want to support your local poor king who needs servers for kingly things you can just let me know, I know a guy whos looking.....for the good of the....something or other
100% AMD will pretty much remove the l3$ from the ccd and only have l3$ added to the ccd with the v-cache. instead of the cache being sandwiched between the cores it seems to me be logically to have the cache above/below the cores.
Would be great. But for now connections were removed to save space. Maybe when they axe "whole" L3 cache, they will use some space from this to have connections and have a L3 die on top.... or bottom of the package?
the problem with having cache above the cores is, the cores create heat, and that heat would go into the cache, which is very sensitive to high temperatures. Hence, currently 3D cache is only on top of L3 cache already on-die. Zen 5 will be the same, but I expect Zen 6 to fully utilise V cache from scratch (since Zen 6 design started after V cache was already functioning
@@VideogamesAsArt Yes. That is why they are also experimenting with cache being below and some other tricks with core topology. Details are unknown to me, but I'm certain that they are thinking about it really hard and have some of the smartest people on Earth working on it.
you are better off with something like a threadripper. I would say Threadrippers are better performance/price than Epyc, Epycs are sold in bulk to big companies.
@Level1Techs Can you answer me these questions. With such an enormous amount of L3 Cache, can these be used (for fun and testing) with no "system/main memory".? If not (most likely not), can you show some kind of test that hits the CPU hard, but doesn't need a lot of RAM, and then test it with the full compliment of RAM vs a single 8GB stick.! That would be a fascinating test to see, although I cannot suggest any tests for you, but have noted that you saw some impressive increases in "DwarfFortress" with massive L3.
Level1 I was wanting to build one last pc for this gen that crushes editing clips and can play whatever game I throw at it no matter the settings is the Genoax or the epyc bergamo worth it Can they game
The best videos are the ones where you excited about the subject. AMD has put themselves in such a good spot with chiplet design.
Intel has done well but only in the past 4 years due to AMD's continuous innovation. Thanks to Dr. Lisa Su they have made very strategic short and long term technological and business decisions that will reap their full benefits probably within the next 4years across the board. Basically completing 10 years since ZEN/EPYC/NAVI have come on the scene.
@@chrisbullock6477lol SR drew even with two year old stuff after three years of dev and small socket stuff uses multiple times the power to do the same work as Zen 4
1.1 GIGABYTES of cache... More on die cache than my first computer had RAM. Truly incredible.
More cache than my first PC had hdd space. Win 3.1
@@nicholassmile5800 how far we've come...
You must be young. My first PC had 1MB (that's megabyte) of RAM. Many CPUs these days have that much or more L2 cache *for just one core* than that!
Now, I'm sure there are people who started out with IBM XT or something. More older to you, then (LOL).
@@catsspat you'd be correct, mid 20s. The first home computer we had was an XP machine from Gateway (rip). Lots of good memories learning the in's and out's of what is still basically the same thing we use today. I'm just glad we didn't get a ME machine...
Yeah that's nuts. That's 3 times the amount of memory my first computer had on the hard drive. It was some old IBM. Can't remember what model. Windows 3.1 on it.
Cities:Skylines 2 is promising to fully take advantage of all cores, stating the only limit on the simulation size will be your computer. I would love to test it on these to see just how it would compare to, say, the 7950x. If it did make a difference I could see hardcore CS:2 players becoming Epyc fans.
You would get much more from 1GB extra V-cache than have 128 4c cores
Is C:S2 using an agent based simulation model? I guess that should scale up embarrassingly well.
@@lukabozic5 you don't get the 1 gig for all cores tho.
@@mos6581com we don't know yet - but it looks like it from the way they are talking about how the cims evaluate decisions they make for transportation, education, happiness, etc.
btw, you do realise how disgustingly expensive this is, for a single game right?
on a side not: factorio tho!
Lisa Su seems to be the Getafix of AMD, not designing anything but brewing the magic potion for the engineers
I love when AMD wins.
More competition for Nvidia, Intel and Apple is good for us.
One of the reasons the F series sometimes pulls ahead is because of less pressure on the L3 cache, some workloads with poor cache locality will basically thrash the L3 if there are too many cores in a CCX doing it at once.
Great overview! Especially when you have a drink every time Wendell says "dominates".
Its been a while for me since I've seen Wendel in a video. You're looking great Wendel, keep it up!
He's been forced to be a vegetarian by a tic bite ask him about it, it's crazy
I bet Intel is now wishing that they could just glue cpus together.
Intel’s alder lake from 2 years ago has chiplets. They use silicon interposer to create a true on chip tile design. AMD chiplet is connected through PCB which has delay penalties but it is cheaper and more easier method than intel’s tile system
Remember them clowning AMD for it's chiplett design, guess that engineer is selling hotdogs now.
@@boemlauw to be fair. AMD chiplet is on pcb board, lol. Intel chiplet is connected through silicon wafer. It a big engineering challenge to go from pcb board to silicon interposer. That is something AMD will have to overcome as the inter chip latency becomes more important
They do but the result is Sapphire rapids... or Sapphire sluggish I would say 😂😂
@@slimjimjimslim5923 Alder lake absolutely does not use chiplets. They're all monolithic.
Thanks for including some single thread benches, far to many people tend to ignore those since the total system throughput have become such a vital stat these days, but for some work loads the latency is also important.
Wow this is brutal. Everybody knows but nobody expect this magnitude of annihilation.
Hi Wendel, amazing presentation and performance! Just wantes to reach out maintaining the #t2sde Linux distribution for 20 years already, and it would be amazing if you had some SSH CPU hours to spare to try how epyc the thread ripping performance building a whole Linux distribution world would be! Thanks! (re-post as 1st post disappeared, ... :-)
Can confirm. René is doing the lord's work and it would be great to see René and the Level1Techs crew work on a project together for the bettering of the opensource world.
Impressive piece of silicon and an impressive video Wendell!
If someone from the future had told me in 2016 that amd would have a 128 core monster of a cpu just 6 years later I wouldn't have belived it. The most impressive to me is how the embrace of chiplets allowed them to maximize yield and their wafer capacity; and paired to that the way they managed to alleviate the chiplet's inherent drawbacks. (remember the first epyc, it was great in some workloads but poor in many others). This embrace of chiplets also allowed them to iterate faster and tricle down the advancements to the consumer chips too. Finally the last strokes of genius are the avx implementation (from poor to better and more consistent than intel across the product stack) and the c cores with full functionality.
Trully trully impressive for a company on the brink of bankrupcy. I hope they can do the same in the gpu front.
Thats will hopefully make the past gen 32 and 64 core more affordable for the rest of us.
Imagine those get dumped on 2nd hand market for under 1k$
@@GewelReal already happened if you know where to look.
LV1Tech never disappoints with checking out these enterprise chips! Thanks for the awesome coverage!
..."Pay Attention Class"....yes Mr. Wendell...
Just think how crazy things would be when this stuff hit the used market. Holy crap!!!
Have some of these workloads hit a limit on their scheduling capabilities? Some of those Bergamo 9754 results seem anomalous. in how bad they are.
The 9784X is a beast by all accounts.
1.1GB of cache...holy crap, my first PC had 1 MB of memory, 4x256k chips, the cache for the chip was external on the board, and I think it was 32kb for that particular board...though it could have been 16 or 64, I don't really remember 100%, and a 40 mb HDD...and honestly, I never thought I would need more, it was so overkill for that old 20 mhz 386 at the time, oh how far we've come in such a short period of time.
Does this mean we're likely to see 16-core single-ccd X3D Ryzen CPUs in the near future? That would be a monster of a gaming CPU.
The vast vast majority of games hasn't been truly CPU bound for ages. Also, while multicore support is pretty much a given these days, many games can't properly utilize more than 6 or 8 physical cores.
I'm just happy AMD's datacenter success is paving the way for Zen 5 R&D.
If we include the consoles then I have bought a Zen CPU of every generation except Zen 1, so far every Zen generation has seen huge uplift and I'll probably go to upgrading three generations in a row if the advancement momentum carries to Zen 5.
This rocks. Exciting times for sure
Im absolutely blown away tbh
I think it would be nice if those lidded CPUs come with some drawing or ther type of visual indication that let us know exactly where the heat generation centers were located, for better termal paste aplication.
Imagine being that guy who just 3d printed his chop block. 20 people looking over your shoulder as it go's 'CRACK' .... WORLDSTAR!!!!
Now AMD has "building blocks", as in packages, for heterogenous multi-socket solutions, with 4c+4x3d for mem/cache intensive and/or core intensive NUMA nodes ;)
On the other hand, working in corporate big data processing I can see benefits to 4c... where previously devs had to pull double duty by rebuilding for up to 128 cores and by replatforming for ARM ISA, now they can focus only on core usage... or they can focus on better large cache optimisation options for x3d.
Best coverage of these amazing new chips❤
Remember those HFT focused Xeons from 2009, those "2 core" monsters that ran at 4.4Ghz, but was really a 6 core with 4 cores dissabled, and then all of the cache available to the 2 enabled cores, man the X5698 was an amazing beast, especially if you could firmware unlock those cores and water cool it.
Imagine they do that here, 24 cores, 1.1GB of cache, up to 92MB of cache per core if you only use one core per CCD
I can still see where some of my workload might benefit from such a chip. (More cache per processor the better for large matrix inversions, f'rinstance)
Are there more recent xeons than 2009 that use such a "disable some cores to reallocate cache" strategy? Maybe a 16-core that is limited to 4 cores?
Very exciting stuff!!
Like the good, old days of Opteron! Keep going AMD! 😊😊😊
128 cores all my task in machine learning and preprocessing of large data base (except for deep learning) will be fast than ever
Is this for work or for home fun? Either way would be interested in you sharing more
@@pauljones9150 i am freelance
Jesus dude, i havent seen a video from this channel for an entire year and your weight loss makes you look younger.
NGL, you should be part of AMD Wendell, you do a good job of marketing their server side of things.
I keep looking at Genoa I currently have a 7551 Epyc for my workstation, but definitely my next rig will be Epyc again.
I think my only gripe is the lack of normal IO on server mobo's a lot of people prefer Threadripper but I honestly love having IPMI which. Asus rack is my favourite though with no license cost for the IPMI unlike supermicro.
Machine learning right now is kinda hard to get into fully, I run a Xilinx Alveo u250 FPGA accelerator for NN's but they're not super convenient to accelerate all types of models, or even port to.
I hope that AMD is working on bridging the gap though with their new cards, I am a little disappointed that the Xilinx/AMD FPGA side of things has paywalls for some of their accelerators, as a dev I need to use these cards to actually learn to go into production environments, and don't want to buy the potato dev boards they have, I'd rather have what the company is going to deploy.
For AMD to have a full edge on NV cards I think we need:
- FPGA portion (which can be locked and changed in groups of cell's)
- A plug and play solution to just using these accelerators with PyTorch/Llama models without converting to proprietary Xilinx stuff.
AMD/GPU creators:
- please, make GPU/accelerator cards memory up-gradable again, with AI we need to be able to put more than 256GB of vram, without having to buy another card when our model gets too large... I want to run more than one network at once...
I would like to see EDA benchmarks . Specifically between these AMD beast and NVIDIA's CuLitho. Although I see how difficult that would be when attempting to compare apples to apples. Still it would be fun
Interesting to see potential performance uplifts for specialist tasks~
That's some crazy stuff they got going on.
You should try running Lavapipe Vulkan and LLVMPipe OpenGL/OuenCL on in.
I want to see this for sort of weird Digital Humanities projects -- stereophotogrammetry (although it'd need a good graphics card too), machine learning of old scribal hands, stuff like that.
It’d be really interesting if you did something with Oxide Computer, considering the amount of the stack they have reimplemented/rethought.
I wonder about the thermal solution and the chip temperature in this situation. The Vcache chiplets have notable worse heat dissipation than the regular CCDs on the consumer platform. No thermal tricks? Just a drop in replacement under the same coolers? With 400W in that density, it seems wild to me...
There is no 9784X, as the 7 stands for 128 cores. You mean the 9684X right? Would be nice to have this clarified in the description or in a pinned comment
I was looking for this comment!
Lisa Su has done amazing work with AMD that us consumers really benefit from.
Sadly I don't remeber where I saw it, but recently I came across some kind of compatibility list or spec sheet or something, that mentioned a 128 core CPU. Then it said AMD somewhere and I was like "LOL someone must've mixed that up with an ARM CPU, EPYC only goes to 96 cores!" Guess it wasn't a mix-up after all.
Sure as FUCK wasn't! ;) God bless AMD for keeping the pressure on the market - no CPU-manufacturer can sleep well as long as they're still executing - keep lighting that fire!
Once all the program code fits in the cache you don't stall the data flow from main memory with program code. The "stalls" are what kills throughput Programmers need to write tight code to get down to this level though
It's nice that they're increasing the number of cores per CCD. On no-so-old 24-core 74F3 (zen3) it was limited to 3 cores per CCD (the techno would allow up to 4 max for a 32-core), and the inter-CCD latency was absolutely horrible, making you regret not passing the data via a floppy disk instead, making it impractical to use for applications using more than 8 threads. On zen4 they increased the limit to 8 cores, which is significantly better for many workloads by allowing applications to run up to 16 threads with great performance. So if they raised that to 16c/32t with zen4c, that's even better and an awesome news, it will unleash all that power that was not usable before outside of clouds where you sell CPUs one at a time.
Instinct MI300 already got Zen core ... So we may get something in reverse, FPGA, HBM2 or CDNA chiplets may placed on CPU package someday (if market demand it), I think the most beneficial will be embedded appliance/accelerators. One chip rules all.
Nvidia’s H100 GPU & Grace CPU is a poor attempt to bring the CPU to the GPU card. H100 has 512G of DDR5 & 128G of HBM3 for CPU. AMD’s MI300A is a better design.
that's...kind of SoC isn't it?, like apple M1-M2
@@NKG416 In essence, yes - a chiplet-based SoC, instead of a monolithic SoC. Technically though, almost every CPU is an SoC to some extent, these days.
4:05 It's a good angle, when you're sitting like that. Somehow is better received.
Brownouts widely anticipated.
I wonder how fast this cpu would H265 encode my 500GiB video library. What a madlad!
Son, you can store that shit on a USB stick in the directory BEFORE and encode it on another dir called ENCODED all on the same USB stick, got to put those numbers up :)
@@boemlauw this is totally irrelevant to the video about a monster cpu. And usb sticks, really?
I thought there was suppose to be a link to benchmark results, but I don't see it.
About Genoa 16-core three-quarter gig cache chip, is that hard wire 48MB per core? Or if I'm not using all the cores does it dynamically allocate more cache per core? (i.e. if I specifically write my code so that I'm only using 4 of the 16 cores, do i get 192 MB per core?)-
@Level1Tech, do you know if the Linux kernel and the tool chain have had CPU specific optimizations added to them yet for these new CPUs? If not, I'm curious how much difference that would make.
There are no architectural differences you could optimize for with these.
Kernel itself nothing more or new specific for these chips, the improvements would come from better NUMA arrangements for respective use-cases, but there are apps in server space in data/big data/compute/client serving that could be configured for more cache used to improve in memory performance or for more in number but slightly less heavy concurrent processing pipelines per socket/blade/rack...
I can't wait for the threadripper version
The Genoa and GenoaX graphs are labeled as 2P. Did you limit the test to 96 cores on those systems, or are those running with 2x 96 cores vs 128 on Bergamo? Will there be dual Bergamo boards too, since they are basically drop in compatible with Genoa?
Good question - 2x 128 cores system... that's pretty... EPYC. :O
8:52 Symmetric or Simultaneous? I know there's Simultaneous Multithreading (SMT) and Symmetric Multiprocessing (SMP), but what's Symmetric Multithreading?
I want to buy one for gaming. Where can I get one??
i wish AMD on CPUs would do the same great job with their GPUs
Well, unfortunately there are several reasons for that, number one being the fact that the T.A.M. on enterprise x86 is much larger than that of Consumer dGPUs. Also, enterprise customers ACTUALLY buy items on empirical data, whereas consumers, like consumer psychology has shown us for a long time, buy items for a host of irrational reasons, and this is what makes Nvidia's monopoly so hard to break. AMD could objectively release better GPUs that offer more performance for the dollar and people would STILL keep buying Nvidia just do to perception and the parasocial relationship Nvidia fans have formed with the brand. That and Nvidia spends an amount on GPU R&D that AMD spends on both x86 and GPU R&D and then AMD still have to compete with Intel at the same exact time and Intel is spending over 3x as much as AMD on R&D....in other words, AMD is competing against two rivals at the same time and both of them have access to significantly more financial resources, which makes it all that more impressive that AMD's able to do as well as it does, but also explains the limitations of the resources AMD has to compete against Nvidia.
The money is in data center CPUs and laptop CPUs. AMD almost went bankrupt prior to Zen. Add that general consumers will buy Nvidia even if AMD produces a superior GPU, and you start to see why AMD focuses on CPUs.
Last gen AMD gpus were superior, the radeon 6000 gpus were more power efficient and had often superior performance to nvidia, yet people still bought nvidia. This generation AMD is faster and has more vram for less price at every pricepoint yet people are talking like you "I wish they did the same great job with gpus"
@@PineyJustice AMD needs several generations of superior GPUs to cut into Nvidia. Otherwise people won't shift.
@@SomeUserNameBlahBlah AMD is superior for GPUs at every price point this generation aside from the 4090, and power efficiency isn't a real argument considering how widely ignored it was last gen when nvidia was losing badly at it. A 7900xt is 100$ less than a 4070ti, nearly 10% faster and has 20gb of vram vs 12gb, does it really matter it uses 70w more power?
So there is theoretically room to move to a 12 x 16 configuration for 196 cores?
Yay! Thanks For this!
Cool, faster access to cat pictures and bath tub livestreams
I've said this for a while about AMD's io die stategy. If they really wanted to, there is nothing stopping them from putting a zen 3 compute die onto an AM5 package perhaps for a low cost am5 cpu or going the opposite direction and putting a Zen 4 compute die onto an AM4 package.
I would have to wonder how Zen 4 would be impacted by the relative lack of memory bandwidth on DDR4, if a theoretical Zen 4-AM4 CPU were to exist. Not to mention how exactly it would interface with, say, a B550/X570 chipset.
while it would work, each core is designed for the IO die it's supposed to be paired with. AMD themselves said they can't increase the core counts (more than 16 cores in consumer desktop) unless the memory bandwidth increases too, since the cores would be data starved. They could do that but the performance would suffer
@@VideogamesAsArt Years ago Intel said it would not do more than 2 cores per memory channel. So dual channel, 4 core is what we got. We now have the 13900k with 24 cores total. So I'm not really inclined to believe whatever either company says.
I wonder if Intel is at all concerned. I also want to see benchmarks of the 16-core V-Cache part in games.
They absolutely are concerned - Intel has continued to lose market-share in Server and HPC for almost 5 years now. Now that they've had to do some down-scaling in order to cut costs, you can bet they are more concerned than ever. If the trend continues, which it honestly appears to be, then the two companies might end up being about the same size.
There was a prototype 2 chiplet 3dv cached Zen 4. But the advantages are not that high.
Cue Homer looking at donuts.
Good to hear, Word was getting a big sluggish. This might just fix it.
Joking aside. THAT is some real powerful CPU. I am sure some webhosts and other hosting services will LOVE this one. Not to mention businesses doing important calculations QUICKLY and so on. Scientists will drool at the idea of such power, the amount of complicated models they can throw at that!
How about seeing how many points per day you can get in Folding@Home?
I'm sorry if this is a dumb question but what does he mean with "The chiplets are behind the I/O Die" at 04:45? In the official slide from AMD shown at shown at 05:05 they're very clearly not behind the I/O Die..as the corresponding 8x16 chiplets constituting 128 cores are located on top of the substrate at the sides of the I/O Die where they've always placed them.
I understand that it's me that's missing something but I'm curious to find out what exactly. So please, pretty please 🥺🙏 I'd really appreciate if anyone can help clarify what kind of chiplets are they employ besides the cores on the Bergamo CPUs, that Wendell is talking about that's placed under the I/O Die?🙏
✌🖖
Its not about the physical location, its about interface. The IO die is responsible for communicating between the CPU and the rest of the computer, in that sense the chiplets are behind the IO die, think of them as standing behind a door, that is the IO die, and the rest of the computer is on the other side.
@@steffennilsen2132 😆OH!…that..🥴..that just makes A LOT more sense 😁. I for some reason just got stuck parsing it way too literally.
Thx 🙏 appreciate it! ✌️🖖
I find myself wondering if Zen4C or it's successors might find a home in an asymmetrical setup like the 7900X3D and 7950X3D, eg. one 8 core chiplet with 3D vcache and one 16 core Zen4C chiplet with less cache but either the same or higher clocks? (I'm aware that we don't have data on how Zen4C would clock in that situation) Making the Performance and Efficiency cores be the same fundamental core design with different cache structures intrigues as a notion?
I feel I need this on a ITX SFF build for some reason.
This is so cool! I want to see this be taken to the extreme some day. Maybe a bit-little core design. Put 2 of those 16 core chiplets on a die, together with 10 hypothetical 64 core (small weaker cores) chiplets and bam: 672 core server. Bye competition, bye gpus!
I can't wait to get a $99 EPYC cpu/motherboard combo from china in 10 years. 🤣
Not too many years ago, people paid over $1k usd for an XT computer.
@@aleksandrbmelnikov I had an ibm XT in the 80's, kept it way into the 2000's for a server until it finally died. 🤣
3 years.
If I meed a server for SQL Database and image comoression (Jpeg 2000 lossless) - with or without Vcache?
Did I miss the costs of the systems?
In the next gen desktop 3d chips it would make sense to have a 3d v-cache die and a zen4c die. Specially if they improve the infinity fabrick so the 16 core chiplet can derive some benefit from the other die cache. Won't help in games but it could be both a gaming and productivity beast. I also think that they should put some L4 over the IO die.
Is there anything new for socket sWRX8?
Incredible
Any chance you can run a COMSOL benchmark of some kind? I do a lot of COMSOL simulations and they all run on a per core basis and not per thread. I currently have a dual socket 9554 EPYC system with 1.8TB of RAM.
We have some threads on the forum with comsol it's an odd duck under windows but more consistent on linux .. videovappn I hope
Hey Wendell, I am following you for a while now.. I do have a interesting workload for you, maybe.
We have deployed a vsan cluster with 100GiG wires speed, however our Intel cpus struggle verry hard to go anywhere close to wire. They top out at about 40gig/s.
This is already running multiple iperf instances and jumbo frames.
I wonder if those amd cpus can handle network IO any better?
Post to the forum an iperf spec
Sweet wooden floor!
Please show some cinebench =)
3D vcache perfect random high load shifts
would be interesting to see the difference between Ryzen 7950x3d vs these servers.
OK Seriously We need another Level1Tech and Linus Media collab here...... Dual or quad socket Epyc 128Core Vcache monster VM gaming LAN PARTY...... Then a seperate video Inviting bunch of either streamers or pc content creators to duke it out..... !!!!!! WHO WOULD LOVE TO SEE THIS. HECK THEY CAN EVEN DO IT IN A LIVE STREAM FOR BOTH>......
how about RAS?
Can you test the 1gb 3dvcache cpu in simulation heavy games vs 7800x3d, like lategame stellaris or factorio. And other cpu heavy titles like 1080p cyberpunk with rt on 4090
I'm sure it will make a fine upgrade to my plex server
Tech is going TO THE MOON!
Amazing video
STOP IT Wendell! I JUST FINALLY got an E5 2690 V4 and now you tell me bout this thing so I feel poor again just when I was feeling like a king :( :( :(
Unrelated note - if you want to support your local poor king who needs servers for kingly things you can just let me know, I know a guy whos looking.....for the good of the....something or other
100% AMD will pretty much remove the l3$ from the ccd and only have l3$ added to the ccd with the v-cache. instead of the cache being sandwiched between the cores it seems to me be logically to have the cache above/below the cores.
Would be great. But for now connections were removed to save space. Maybe when they axe "whole" L3 cache, they will use some space from this to have connections and have a L3 die on top.... or bottom of the package?
the problem with having cache above the cores is, the cores create heat, and that heat would go into the cache, which is very sensitive to high temperatures. Hence, currently 3D cache is only on top of L3 cache already on-die. Zen 5 will be the same, but I expect Zen 6 to fully utilise V cache from scratch (since Zen 6 design started after V cache was already functioning
@@VideogamesAsArt Yes. That is why they are also experimenting with cache being below and some other tricks with core topology. Details are unknown to me, but I'm certain that they are thinking about it really hard and have some of the smartest people on Earth working on it.
AMD EPYC™ 9184X seems like a nice overkill 😅
16c/32t with 768MB of L3 cache 😎
I'm looking forward to see new Threadrippers 😅
On desktop they can get 5+ GHz, so why not a 12-core 1152 MB? Surely they can find ONE fast core on each die?
@@shanent5793one gb l3 on 12 cores is a CRAZY prospect!!!
ThreadRipper editions of these would be some insane game changing shit
There is no way AMD is thrashing perfectly capable, highly sought, and bigger margin EPYC chips on Threadripper.
@@jairo8746 Threadripper 7000 is said to be released Q4.
Is it possible for a mere mortal to build an epyc system for rendering under 8k?
No shot, that cpu alone is easily ~10k+ USD probably closer to 15k USD
you are better off with something like a threadripper. I would say Threadrippers are better performance/price than Epyc, Epycs are sold in bulk to big companies.
Can other pc components even catch up to this speed? Cus if ssd cards and ram cards and gpu cards cant really match this speed then whats the point?
Hi wendell, what will be gaming performance impact on kvm running win10 on linux host let's say passing 7cores from 5800x3d or 7800x3d to VM
So if you buy a one core vCPU in the cloud for $5 per month you get a 128th of this platform?
can we have a socket g34 version now ;-)
One Gig cache memory..on a CPU 😮
PLEASE! Test LLMs response time compared to A100, H100, 4090
@Level1Techs Can you answer me these questions. With such an enormous amount of L3 Cache, can these be used (for fun and testing) with no "system/main memory".? If not (most likely not), can you show some kind of test that hits the CPU hard, but doesn't need a lot of RAM, and then test it with the full compliment of RAM vs a single 8GB stick.! That would be a fascinating test to see, although I cannot suggest any tests for you, but have noted that you saw some impressive increases in "DwarfFortress" with massive L3.
i want one oh these monsters, just to plat wow vanilla and watch RUclips videos, ad of course to admire the windows task manager
Level1
I was wanting to build one last pc for this gen that crushes editing clips and can play whatever game I throw at it no matter the settings is the Genoax or the epyc bergamo worth it
Can they game