Thanks, glad to see you doing well too, your channel growth is great! Love the content, even through the incoherent consciousness streaming, the enthusiasm level ++ and it all makes sense in the end! :)
Even though I had to rewind many parts of the video because of my lack of understanding I really love your videos. No rumor, no leaks, just hard data. Very entertaining I would say.
Finally somebody actually does some analysis of the hardware. That is sorely missing and I find that to be far more interesting than simple opinion statements. I myself have been looking at some of this stuff, but I am a historian. I am pretty good at text analysis, timelines or system analysis, but I lack some of the more concrete, the practical knowledge. I long since figured out that RDNA must have been started around the time Fiji was launched and I realised just how much the RTG actually managed to do with a budget that must have come down to rubbing two sticks against each other and hoping you get a fire started. Later on the RTG got some much needed strengthening because a number of Zen engineers where moved over to the RTG and that was made public. That was somewhere in 2017 if I remember correctly. To late for the ground work of RDNA1, but probably vital in its completion. For me that was the point to start paying attention. It signalled a clear intent to revitalise the Radeon side of AMD. I also suspected that RDNA2 its RT might surprise a few people and I knew about the patents. Mark Cerny also talked about the Intersection Engine which confirmed that patent. I did however miss some of the bandwidth features and some of the practical improvements and implementations to increase utilisation that Microsoft unveiled with its RDNA2 chip. edit: I also just discovered that alongside DXR, Microsoft also announced DirectML for upscaling. You might be right about ML functions for RDNA2 because I strongly suspect that the XSX will support DirectML. That implies RDNA2 support and unlike DLSS, this will be the standard because it uses DX12 and is already implemented in W10. www.overclock3d.net/news/software/microsoft_s_directml_is_the_next-generation_game-changer_that_nobody_s_talking_about/1
RDNA was started conceptually the moment Mark Papermaster decided to lure Raja back from Apple. Internally, they must have realized that they made the wrong bet on GCN being too compute focused, as gaming revenue is still going to be a major factor in the health of the company moving forward. But the console deals that ensure AMD survived, meant whatever new architecture had to be built on GCN ISA compatibility. It's a tough challenge for the team, but they delivered. With regards to ML, I think that having one dualCU per array with improved SIMD-32 that can handle 32x32 tensor ops should be fast enough for an effective ML based upscaling feature. More will only shift away from the gaming graphics focus of the architecture, and CDNA is what is designed for the other markets.
Great video. It turns out the clue to how "Big Navi" could only need a 256-bit VRAM bus was in the Xbox slides all along, but you're the only one I've seen who really latched onto it.
Yes, Sampler Feedback is really good at reducing memory bandwidth bottlenecks. But also some other novel features per AMD's patents for the CU & L1$ & L2$ efficiency. There is a very strong chance that RDNA 2 may be much more bandwidth efficient than even NV's latest GPUs.
Thanks for the video. Subbed almost right away. I'd highly recommend editing your audio, a bit more clarity and volume would really help us phone viewers.
OH HEYYY THERE! A wild nerdtechgasm appears. Great to see you back man, hope you're doing well. Looking forward to seeing more videos. Stay safe and take care :)
Great analyse! One thing you should look to change is audio volume, is to low, I had to put all on 100%, and accent not helping with understanding (hard for me topics)
Excellent video:) As a note if you want to improve further, just get a better mic. Samson Q2U is the holy grail of quality to price, almost the same quality as a Shure SM7B which costs 10x more! Plus it comes with accessories like a wind screen filter, desk stand AND includes usb ADC on board if you don't want to use the XLR!
@CSIS SM58 is still more expensive for much less. 100usd vs 70us without any accessories that you will need and requires that you have an audio interface..plus the tone of Q2U is closer to SM7B than SM58. There is literally no competition here for a starter mic. Q2U is unbeatable at that price. edit: Wrote SM8 instead of SM7B
@@nerdtechgasm6502 At least she didn't break her iphone x phone screen just 1 month after you replaced it for her on your own..i'm talking from experience here XD
Thanks for the analysis mate, I had to re-watch to better understand haha. I’m excited (and getting impatient!) to see what AMD brings with RDNA2 and in particular big Navi
Did not expect such an in depth and technical breakdown of gpu architecture. Its fascinating, though I admit most of it goes over my head. Are there plans to make a video explaining the basics?
This is as basic as you can get for architecture analysis. For basic graphics pipeline in general, I believe there are several channels that do that better than me. :)
You seem very knowledgeable, I've been scratching my head for a long time maybe you can help me. What's the relation of memory speed and buss = bandwidth got to do with performance. I know if the bandwidth is slow the card can't quickly stream all the assets stored on the Vram, but I see a lot of people 256bit is optimal for the 3070, how do they come to such conclusion?
Anytime that the SM/CUs run threads, they have to fetch data, if it isn't in the cache, it's going to the vram. Faster vram reduces the cycle wait latency, typically GPUs have to wait 200 or more cycles to start to get something from vram. Faster vram also allows the data being fetch to transfer quicker, and it can transfer faster to many more SM/CUs that constantly request. If vram bandwidth is not enough, the SM/CUs idle and you lose performance. The MC is partitioned as 32bit blocks usually, the more you have, the more concurrent transfers you can handle essentially. At the cost of die space, extra power usage and more expensive PCB (more layers for memory traces).
Thought's on David Wang returning to AMD & replacing Raja as the engineering head of RTG? From what I can gather on the internet David Wang was actually one of the engineers leading the development of GCN 1.0 as a compute oriented architecture so it actually has come full circle for David Wang overseeing RDNA replacing GCN
David and Raja go back to the old ATI days. They've worked together on many architectures. You can find more info on Raja in particular in my other video. David is a very talented engineer, no doubts about it. But they aren't CEOs or even up there in terms of ranks, so they do what is asked of them, regarding David working on GCN (Co-developed with Sony).
Do you think asynchronous compute is under-utilized by game developers? Will we be seeing more of it? I'm not well versed in graphics tech, but it seems that there has been some slow progress in this area.
@@nerdtechgasm6502 Thank you, makes sense. But I wonder why the uptake for these has been slower? Takes time to learn? Ramp up new code bases? Difficult to implement? Diminishing returns? Maybe DX12 and Vulcan are not mature enough yet? Or maybe development engines haven't packaged/supported it very well? Can't help but notice that Microsoft Flight Simulator 2020 is still on DX11, and I assume that is a game that would benefit quite a lot in this respect.
@@aquaticborealis4877 It's a combination of all of the things you mention. Game engines also have generations like hw, they are iterated on over a long period of time. Radical changes don't happen frequently. The other major factor is that consoles were relatively weak, this limited the scope of what game designers can build. Because of this, what they build are nowhere near the limits of what DX11 can handle. So there is little need to use DX12 and Vulkan for performance reasons in these games. The next-gen consoles will change all of this though.
Word of advice: When comparing PS5 to XSX, and stating that XSX has just shader/compute advantage as it does not have increased number of other blocks. Just check TMUs which are directly bound to CUs as SIMDs are. And when you are at it, Check ROP count, dispatch capabilities, cache volume, ... One can say that there is still faster interconnect at higher clock. But that's ultimately of no relevance when important is ability to keep each working part utilized. And when you are able to get required data sets into place where they are needed in time thanks to better data granularity and ability to actually work with data in their compressed form.
I did mention Series X should have a RT advantage due to more TMUs. :) As for ROP or cache, we don't have that info public, at least not when I made this vid. AMD's ROPs have been decoupled from MC or CUs, so without a specific architecture reveal from MS..
Great Video! Now I'm picturing a (made-up) Nvidia employee whose entire role consists of thinking up new "-Works" features specifically designed to cripple the AMD graphics architecture. His office would resemble a Red-Team Torture Chamber, complete with wailing fans and geometry-heavy workloads to torture high CU count AMD GPUs of yesteryear. He would also have developed a single-number metric specifically measuring the difference in frametime cost between Nvidia and AMD architectures so as to rank potential technologies according to how well they "widen the gap" between the two companies. "Skin-works is 3.5 jG (Jen Hsun Giggles). You think that's bad? Wait till Marble-Works next quarter- it's almost 6.2 jG!"
i am quite surprise that i did understand quite a lot from this video even if i lake the base of the subject, that just prove your are great at bringing complex topic to a more casual level ^^ If you got time a video on just the base of how GPU (amd and Nvidia and maybe intel) work i could benefit you and your video for the future, like battlenonsense did with he's "netcode 101" video
I try to get a balance on being technical just enough that the average person who has an interest in hw can understand or appreciate the architecture changes. Sometimes it is not entirely accurate but for that you would have to attend a comp sci lecture on 3d rendering (too long and boring).
thanks it was very informative! Also very interesting topic is AMD APUs, so why do you think there is no progress in APUs since 2017, are they waiting for DDR5 to become available for consumers because DDR4 provide such bottlenecks that you cant scale performance above Vega 11, or they want HBM to become cheaper and implement it for huge gains, or they just want to sell cheap low-end discrete GPUs? I just dont get it, a massive APU with 8 Zen cores and GPU equivalent of RTX 3050-3060 would be so disruptive and could kill all Nvidia low and mid segment
Hi Raven, what you are describing (massive APU), is basically Series S/X and PS5. If MS actually wants to, they can easily put the SOC from the Series S into a notebook or NUC and corner that market. As for why AMD haven't done it themselves, they have to be-careful not to overstretch limited $ and manpower on many projects, these kinds of things are best left to OEMs. As for regular desktop APUs, DDR bandwidth is THE limiting factor preventing more powerful iGPUs.
Not that I know of, it can run both wave32 and 64, and will run them just fine relative to GCN, if the game can fill wave64 with work its a-ok (CU is 2x SIMD-32 with its own dispatch every cycle, no issues with wave64). Only downside I can think of with wave64 on RDNA would be reduction in registers available to each thread, but RDNA still has many more vgprs available.
No! Excuse me sir, about that Raja / Wang situation i rly want that speculation. It has allways been a mistery to me if AMD failure to produce a decent GPU was due to Raja's incompetence or inhouse corporate skullduggery. So educated guesses are very much wellcome
I don't like speculating when the topic lacks info to base the speculation upon. Makes it too error prone. It's already risky enough using official info and documentation, such as my previous Vega videos with architecture analysis, and I got that wrong due to the fact that those claimed architecture features were never there.
@@nerdtechgasm6502 Yeah but you made some insightful remarks right there from 5:28 - 6:41. I feel it would be really interesting if someone somehow could lay a timeline of sorts, and see what comes of the different overlaping between funding/defunding-changes in GPU Dpt. management-departures...anything related, to see what picture comes back. Questions: Is Raja NAVI´s real father? If Navi 2 is a success, regarding Raja, would it be a case of stolen valour? Should he deserve recognition? Or is Raja ultimately incompetent, a person that set unrealistic unachievable goals driven by his delusional ego? Was Raja in reality a burden? In relation to intel is Raja going to be an individual of extraordinary or a toxic asset ? Was Raja kept in the dark regarding NAVI? Was Raja's departure planned for years in advance (as long Navi was in the making) and Wang took care of NAVI secretly in the oposite corner of the building? etc....
@@lesguil4023 I can't answer most of them since I lack data, however, I can answer this: Raja is definitely not incompetent, and he is very passionate. He and David Wang have a lot of respect for each other, as well as for Mark Papermaster (the major reason Raja returned to AMD is due to these two). Navi is the result of the graphics team's efforts, when Raja was in charge. That was what AMD hired him for, to help steer the graphics division towards a new architecture for the next decade. It's similar to Jim Keller being the face for the CPU team. He is not responsible for Zen's development, the R&D team did it. Mike Clark and Suzanne Plumer's team mostly. However, Keller is a key figure head that the engineers bounce their ideas to and from, and provide regular updates to. Lots of quality checks have to be done on many different engineers work. It's a team sport, or rather, multi-team sport.
@@nerdtechgasm6502 well man, thanks for the response. That much i understood to be true, that this is a team effort in a marathon kind of race, and a lil' bit industrial spionage ;) Keep the good work. There is more substance in one of your videos than in 100 vids of some other techtoobers. Congratulations.
Looking back RDNA 2 was a great competitor to NVIDIA's 30 series. The closest AMD had come in a long time to nearly taking the lead over NVIDIA. This was a great architecture, and still is!
TSMC 7nm versus Samsung 8nm. If nVidia had TSMC for the 3000 series, it would've been a bl00dbath. Now that nVidia has the node advantage, RDNA 3 pales in comparison. The 7900m barely matches the 4080 mobile, let alone the 4090 mobile.. And the former came out after the latter. Even Intel beat AMD on upscaling/raytracing tech. A CPU company.
@@jayclarke777 You sound like a fanboy bro. AMD had a pretty solid lineup for RDNA 2, NVIDIA have great GPU's but their high price to performance isn't worth it to justify.
I want to add here for future viewer's sake - NerdTech is no longer a logical actor. He has become a Kremlin talking head and spreads anti-establishment thought-terminating cliches and rhetoric that aligns with non-State pro-Russian actors. He cannot be relied upon for anything anymore.. .
I was honestly expecting the series s (if it was even real) to be like.. 50$ cheaper, but at 300$ it's... yaaaa... It's all coming together. Let's see if AMD turns the tables and offers 2080TI performance at 300$ in rast and raytracing. That said, with the method AMD went for, routing the raytracing with the textures - would HBM2 be a far superior option compared to GDDR6 due to the massive bit size?
AMD can't be as aggressive on price as MS and Sony rely on subscription services and a cut of each game sold on their platform. HBM2 yes, we may see it on the very high end where cost factors make more sense.
@@nerdtechgasm6502 Ya, I'm keeping my fingers crossed for AMD to come out swinging. I can't find HBM2E prices anywhere, but surely the price differences between 2 12GB stacks and the extra layers NVidia had to go through + new power delivery system, are extremely minor? Of course normal HBM2 with 2 stacks of 8GB will still have my interest even if it only matches the 3080 because I want that extra VRAM for Blender without having my wallet ripped out of my face.
@@1ch1r1n HBM2 prices have remained high, as its a low volume product and all the AI, HPC and super computing companies (there is a wider ecosystem besides AMD + NV) buy it all out, it makes it a poor choice for high volume consumer products.
Thanks for the information. A well thought out video. Do you think one RDNA2 CU can match one Ampere SM in performance at similar clocks based on the new modifications? I ask because there is a school of thought that the 46SMs of the 3070 will perform better than the 52CUs of the RDNA2 in the Xbox Series X. This would mean RDNA2 is a bit poor especially when the GPU of the Series X has more CUs, VRAM and bandwidth compared to the 3070.
Series X is pretty potent, though so is the PS5. A lot of the early estimates of its true performance comes from rushed ported games that are not optimize for RDNA 1 even. It is in essence, running GCN optimized code on an RDNA 2 Series X in compatibility mode. As you should know, optimization is the key to extract peak performance. IMO, Series X has much more effective performance than a 2080S class desktop GPU. We should see truly next-gen games on these new consoles probably a year after launch, giving devs some lead time with devkits, and also factoring in cross platform engines (many studios still have to make games for PS4/XbX currently). The first gen of games on a new console tend to be more optimized for previous consoles, with some minor improvements for the new console.
Does the recent news of AMD's buffed up cache system affect this ray tracing analysis? In the end I think I'm going for the card with the least performance hit with ray tracing enabled. Looks like that might end up being AMD
its been a long time between vids... I'm gathering raytracing is still in it's infancy vs devs and not to many games will push it to it's max- even Nvidia's cards will burst in flames if totally pushed- Seems AMD has the burst voltage/frequency headroom and the advantage of lower voltages now readying for the next node @ 5nm
I suspect the sleepy giant Nvidia has some surprises coming when AMD beats or nearly beats their 3080 with a much smaller gpu footprint. With AMD leap frog design and development engineeing strategy AMDs navi3 will likely be a zen3 moment crushing nvidia's market share advantage.
I cover more on the rendering pipeline in my Vega: Of Primitive & Pixels video. It should provide more explanations that will help you understand better.
Here's my thoughts on Navi 21 (BIG NAVI): www.patreon.com/posts/speculation-on-41441731
nice to see u back in action dude
Didn’t think I’d see you here
Thanks, glad to see you doing well too, your channel growth is great! Love the content, even through the incoherent consciousness streaming, the enthusiasm level ++ and it all makes sense in the end! :)
Hi Paul. This channel is so interesting.
Even though I had to rewind many parts of the video because of my lack of understanding I really love your videos. No rumor, no leaks, just hard data. Very entertaining I would say.
Finally a well made analysis of pros and cons comparison between the 2 competing solutions.
Very good and more in-depth analysis, better than the typical tech channels who just parrot the basic marketing points.
Cheers. There's different audiences, other tech channels do a good job.
@@nerdtechgasm6502 awesome videos can you continue analyzing when rdna3 and lovelace are revealed?
Finally somebody actually does some analysis of the hardware. That is sorely missing and I find that to be far more interesting than simple opinion statements.
I myself have been looking at some of this stuff, but I am a historian. I am pretty good at text analysis, timelines or system analysis, but I lack some of the more concrete, the practical knowledge. I long since figured out that RDNA must have been started around the time Fiji was launched and I realised just how much the RTG actually managed to do with a budget that must have come down to rubbing two sticks against each other and hoping you get a fire started. Later on the RTG got some much needed strengthening because a number of Zen engineers where moved over to the RTG and that was made public. That was somewhere in 2017 if I remember correctly. To late for the ground work of RDNA1, but probably vital in its completion. For me that was the point to start paying attention. It signalled a clear intent to revitalise the Radeon side of AMD.
I also suspected that RDNA2 its RT might surprise a few people and I knew about the patents. Mark Cerny also talked about the Intersection Engine which confirmed that patent. I did however miss some of the bandwidth features and some of the practical improvements and implementations to increase utilisation that Microsoft unveiled with its RDNA2 chip.
edit: I also just discovered that alongside DXR, Microsoft also announced DirectML for upscaling. You might be right about ML functions for RDNA2 because I strongly suspect that the XSX will support DirectML. That implies RDNA2 support and unlike DLSS, this will be the standard because it uses DX12 and is already implemented in W10.
www.overclock3d.net/news/software/microsoft_s_directml_is_the_next-generation_game-changer_that_nobody_s_talking_about/1
RDNA was started conceptually the moment Mark Papermaster decided to lure Raja back from Apple. Internally, they must have realized that they made the wrong bet on GCN being too compute focused, as gaming revenue is still going to be a major factor in the health of the company moving forward. But the console deals that ensure AMD survived, meant whatever new architecture had to be built on GCN ISA compatibility. It's a tough challenge for the team, but they delivered.
With regards to ML, I think that having one dualCU per array with improved SIMD-32 that can handle 32x32 tensor ops should be fast enough for an effective ML based upscaling feature. More will only shift away from the gaming graphics focus of the architecture, and CDNA is what is designed for the other markets.
Great video. It turns out the clue to how "Big Navi" could only need a 256-bit VRAM bus was in the Xbox slides all along, but you're the only one I've seen who really latched onto it.
Yes, Sampler Feedback is really good at reducing memory bandwidth bottlenecks. But also some other novel features per AMD's patents for the CU & L1$ & L2$ efficiency. There is a very strong chance that RDNA 2 may be much more bandwidth efficient than even NV's latest GPUs.
Thanks for the video. Subbed almost right away.
I'd highly recommend editing your audio, a bit more clarity and volume would really help us phone viewers.
You rock man, i feel nurtured by your content!
Thank you very much, this helped me a lot to understand the architecture behind RDNA2
Man so good to have this channel back. Never forgot about it. Keep em coming...
Fantastic! Absolutely informative and in depth video. Thanks so much for the upload NerdTechGasm, you never disappoint and are always on top form.
Your channel is so underrated! I don’t understand how could someone promote Coretex channel instead of yours!!
Coreteks content is really interesting and he presents it AAA style. :)
OH HEYYY THERE! A wild nerdtechgasm appears.
Great to see you back man, hope you're doing well. Looking forward to seeing more videos.
Stay safe and take care :)
Thanks Dan, good to see you've been busy making content too.
Amazing content and welcome back!
Great analyse!
One thing you should look to change is audio volume, is to low, I had to put all on 100%, and accent not helping with understanding (hard for me topics)
You are unique on youtube. Congratulations! And tanks once again!!
Incredible analysis. It’s so good to have you back. 🍻
Fantastic work.
Thanks again.
Best video on the next gen consoles! Just subscribed Amazing work. A series on what it would take for photorealism on 10th gen would be great.
Subscribed to your channel for this good video, really worth my time and i really appreciate it. Thank you mate.
Great video! Thx so much for the technical breakdown, really enjoyed 8)
Ray tracing not a sell argument for me, nor is dlss.
Its why I buy zen3 and Big navi later.
Going to be fun
Excellent video
audio is a bit row for us listening on phone speakers. the captions helped though
Wow realy extensive explanation.Keep good work
Great in-depth video, thank you!
Ridiculously small sub count considering these are the best architecture videos on youtube
Ty. Hopefully it will grow as I am more consistent with new content.
Excellent video:) As a note if you want to improve further, just get a better mic. Samson Q2U is the holy grail of quality to price, almost the same quality as a Shure SM7B which costs 10x more! Plus it comes with accessories like a wind screen filter, desk stand AND includes usb ADC on board if you don't want to use the XLR!
@CSIS SM58 is still more expensive for much less. 100usd vs 70us without any accessories that you will need and requires that you have an audio interface..plus the tone of Q2U is closer to SM7B than SM58. There is literally no competition here for a starter mic. Q2U is unbeatable at that price. edit: Wrote SM8 instead of SM7B
I had a better mic, wife lost it in our house move..
@@nerdtechgasm6502 At least she didn't break her iphone x phone screen just 1 month after you replaced it for her on your own..i'm talking from experience here XD
@@St0RM33 Women... gotta love 'em! :D
Thanks for the analysis mate, I had to re-watch to better understand haha. I’m excited (and getting impatient!) to see what AMD brings with RDNA2 and in particular big Navi
Me too. All this new tech, like a kid in a candy store..
NerdTechGasm and we all appreciate your insight, it’s great to have you back now that Ampere and RDNA2 are about to do battle :-)
SkinWorks ! lol
Jensen: Am I a joke to you?
Jensen is more like a LeatherWorks kinda of guy ;p
Did not expect such an in depth and technical breakdown of gpu architecture. Its fascinating, though I admit most of it goes over my head. Are there plans to make a video explaining the basics?
This is as basic as you can get for architecture analysis. For basic graphics pipeline in general, I believe there are several channels that do that better than me. :)
Good content,subbed
Oh damn I was looking for some info about RDNA 2 and HOLY SHIT you over-deliver compared to the rest!
You seem very knowledgeable, I've been scratching my head for a long time maybe you can help me.
What's the relation of memory speed and buss = bandwidth got to do with performance. I know if the bandwidth is slow the card can't quickly stream all the assets stored on the Vram, but I see a lot of people 256bit is optimal for the 3070, how do they come to such conclusion?
Anytime that the SM/CUs run threads, they have to fetch data, if it isn't in the cache, it's going to the vram. Faster vram reduces the cycle wait latency, typically GPUs have to wait 200 or more cycles to start to get something from vram. Faster vram also allows the data being fetch to transfer quicker, and it can transfer faster to many more SM/CUs that constantly request. If vram bandwidth is not enough, the SM/CUs idle and you lose performance. The MC is partitioned as 32bit blocks usually, the more you have, the more concurrent transfers you can handle essentially. At the cost of die space, extra power usage and more expensive PCB (more layers for memory traces).
5Head material right here!
Awesome, I love your work!
Just one suggestion, if I may. Can you try adjusting your mic volume next time? It's kinda low.
Thought's on David Wang returning to AMD & replacing Raja as the engineering head of RTG? From what I can gather on the internet David Wang was actually one of the engineers leading the development of GCN 1.0 as a compute oriented architecture so it actually has come full circle for David Wang overseeing RDNA replacing GCN
David and Raja go back to the old ATI days. They've worked together on many architectures. You can find more info on Raja in particular in my other video. David is a very talented engineer, no doubts about it. But they aren't CEOs or even up there in terms of ranks, so they do what is asked of them, regarding David working on GCN (Co-developed with Sony).
Nice analysis!
Do you think asynchronous compute is under-utilized by game developers? Will we be seeing more of it? I'm not well versed in graphics tech, but it seems that there has been some slow progress in this area.
It's linked to DX12 & Vulkan, which is under utilized. So your question is answered, when we get more games using these API.
@@nerdtechgasm6502 Thank you, makes sense. But I wonder why the uptake for these has been slower? Takes time to learn? Ramp up new code bases? Difficult to implement? Diminishing returns? Maybe DX12 and Vulcan are not mature enough yet? Or maybe development engines haven't packaged/supported it very well? Can't help but notice that Microsoft Flight Simulator 2020 is still on DX11, and I assume that is a game that would benefit quite a lot in this respect.
@@aquaticborealis4877 It's a combination of all of the things you mention. Game engines also have generations like hw, they are iterated on over a long period of time. Radical changes don't happen frequently. The other major factor is that consoles were relatively weak, this limited the scope of what game designers can build.
Because of this, what they build are nowhere near the limits of what DX11 can handle. So there is little need to use DX12 and Vulkan for performance reasons in these games. The next-gen consoles will change all of this though.
Skinworks makes my skin crawl
Word of advice: When comparing PS5 to XSX, and stating that XSX has just shader/compute advantage as it does not have increased number of other blocks. Just check TMUs which are directly bound to CUs as SIMDs are. And when you are at it, Check ROP count, dispatch capabilities, cache volume, ...
One can say that there is still faster interconnect at higher clock. But that's ultimately of no relevance when important is ability to keep each working part utilized. And when you are able to get required data sets into place where they are needed in time thanks to better data granularity and ability to actually work with data in their compressed form.
I did mention Series X should have a RT advantage due to more TMUs. :) As for ROP or cache, we don't have that info public, at least not when I made this vid. AMD's ROPs have been decoupled from MC or CUs, so without a specific architecture reveal from MS..
Great Video! Now I'm picturing a (made-up) Nvidia employee whose entire role consists of thinking up new "-Works" features specifically designed to cripple the AMD graphics architecture. His office would resemble a Red-Team Torture Chamber, complete with wailing fans and geometry-heavy workloads to torture high CU count AMD GPUs of yesteryear.
He would also have developed a single-number metric specifically measuring the difference in frametime cost between Nvidia and AMD architectures so as to rank potential technologies according to how well they "widen the gap" between the two companies. "Skin-works is 3.5 jG (Jen Hsun Giggles). You think that's bad? Wait till Marble-Works next quarter- it's almost 6.2 jG!"
Fun fact, your mental picture is very close to reality. Jensen is a fierce competitor.
i am quite surprise that i did understand quite a lot from this video even if i lake the base of the subject, that just prove your are great at bringing complex topic to a more casual level ^^
If you got time a video on just the base of how GPU (amd and Nvidia and maybe intel) work i could benefit you and your video for the future, like battlenonsense did with he's "netcode 101" video
I try to get a balance on being technical just enough that the average person who has an interest in hw can understand or appreciate the architecture changes. Sometimes it is not entirely accurate but for that you would have to attend a comp sci lecture on 3d rendering (too long and boring).
Great vid!
For a tech channel, the audio volume is really low
Excellent video otherwise, learned a lot!
nice another video. i hope you keep on making new videos, good luck
Only if you uploaded more frequently :(
I really enjoy these types of videos.
Ah yes! The tech architecture talk I'm here for. ** picks up tea ** Let's have it.
thanks it was very informative!
Also very interesting topic is AMD APUs, so why do you think there is no progress in APUs since 2017, are they waiting for DDR5 to become available for consumers because DDR4 provide such bottlenecks that you cant scale performance above Vega 11, or they want HBM to become cheaper and implement it for huge gains, or they just want to sell cheap low-end discrete GPUs? I just dont get it, a massive APU with 8 Zen cores and GPU equivalent of RTX 3050-3060 would be so disruptive and could kill all Nvidia low and mid segment
Hi Raven, what you are describing (massive APU), is basically Series S/X and PS5. If MS actually wants to, they can easily put the SOC from the Series S into a notebook or NUC and corner that market. As for why AMD haven't done it themselves, they have to be-careful not to overstretch limited $ and manpower on many projects, these kinds of things are best left to OEMs. As for regular desktop APUs, DDR bandwidth is THE limiting factor preventing more powerful iGPUs.
Didn't AMD disable wave32 functions in their drivers?
Not that I know of, it can run both wave32 and 64, and will run them just fine relative to GCN, if the game can fill wave64 with work its a-ok (CU is 2x SIMD-32 with its own dispatch every cycle, no issues with wave64). Only downside I can think of with wave64 on RDNA would be reduction in registers available to each thread, but RDNA still has many more vgprs available.
Hm, I'm likely thinking of a temporary measure then.
No! Excuse me sir, about that Raja / Wang situation i rly want that speculation. It has allways been a mistery to me if AMD failure to produce a decent GPU was due to Raja's incompetence or inhouse corporate skullduggery. So educated guesses are very much wellcome
I don't like speculating when the topic lacks info to base the speculation upon. Makes it too error prone. It's already risky enough using official info and documentation, such as my previous Vega videos with architecture analysis, and I got that wrong due to the fact that those claimed architecture features were never there.
@@nerdtechgasm6502 Yeah but you made some insightful remarks right there from 5:28 - 6:41. I feel it would be really interesting if someone somehow could lay a timeline of sorts, and see what comes of the different overlaping between funding/defunding-changes in GPU Dpt. management-departures...anything related, to see what picture comes back.
Questions:
Is Raja NAVI´s real father? If Navi 2 is a success, regarding Raja, would it be a case of stolen valour?
Should he deserve recognition? Or is Raja ultimately incompetent, a person that set unrealistic unachievable goals driven by his delusional ego? Was Raja in reality a burden?
In relation to intel is Raja going to be an individual of extraordinary or a toxic asset ?
Was Raja kept in the dark regarding NAVI?
Was Raja's departure planned for years in advance (as long Navi was in the making) and Wang took care of NAVI secretly in the oposite corner of the building?
etc....
@@lesguil4023 I can't answer most of them since I lack data, however, I can answer this: Raja is definitely not incompetent, and he is very passionate. He and David Wang have a lot of respect for each other, as well as for Mark Papermaster (the major reason Raja returned to AMD is due to these two).
Navi is the result of the graphics team's efforts, when Raja was in charge. That was what AMD hired him for, to help steer the graphics division towards a new architecture for the next decade.
It's similar to Jim Keller being the face for the CPU team. He is not responsible for Zen's development, the R&D team did it. Mike Clark and Suzanne Plumer's team mostly. However, Keller is a key figure head that the engineers bounce their ideas to and from, and provide regular updates to. Lots of quality checks have to be done on many different engineers work.
It's a team sport, or rather, multi-team sport.
@@nerdtechgasm6502 well man, thanks for the response. That much i understood to be true, that this is a team effort in a marathon kind of race, and a lil' bit industrial spionage ;)
Keep the good work. There is more substance in one of your videos than in 100 vids of some other techtoobers.
Congratulations.
Between this and AdoredTV is excellent Technological Insight and it's all for free. Thanks.
If AMD can so the same as Nvidia 3000 series without upgrading the PSU is a win!
Looking back RDNA 2 was a great competitor to NVIDIA's 30 series. The closest AMD had come in a long time to nearly taking the lead over NVIDIA. This was a great architecture, and still is!
TSMC 7nm versus Samsung 8nm. If nVidia had TSMC for the 3000 series, it would've been a bl00dbath.
Now that nVidia has the node advantage, RDNA 3 pales in comparison.
The 7900m barely matches the 4080 mobile, let alone the 4090 mobile.. And the former came out after the latter.
Even Intel beat AMD on upscaling/raytracing tech. A CPU company.
@@jayclarke777 You sound like a fanboy bro. AMD had a pretty solid lineup for RDNA 2, NVIDIA have great GPU's but their high price to performance isn't worth it to justify.
>"Its a great time to be a gamer and tech enthusiast"
This aged like milk...
I want to add here for future viewer's sake - NerdTech is no longer a logical actor. He has become a Kremlin talking head and spreads anti-establishment thought-terminating cliches and rhetoric that aligns with non-State pro-Russian actors.
He cannot be relied upon for anything anymore.. .
Great vid
I'm still waiting on the chiplet gpus
Two new vids
We are not worthy
I was honestly expecting the series s (if it was even real) to be like.. 50$ cheaper, but at 300$ it's... yaaaa... It's all coming together. Let's see if AMD turns the tables and offers 2080TI performance at 300$ in rast and raytracing.
That said, with the method AMD went for, routing the raytracing with the textures - would HBM2 be a far superior option compared to GDDR6 due to the massive bit size?
AMD can't be as aggressive on price as MS and Sony rely on subscription services and a cut of each game sold on their platform. HBM2 yes, we may see it on the very high end where cost factors make more sense.
@@nerdtechgasm6502 Ya, I'm keeping my fingers crossed for AMD to come out swinging.
I can't find HBM2E prices anywhere, but surely the price differences between 2 12GB stacks and the extra layers NVidia had to go through + new power delivery system, are extremely minor?
Of course normal HBM2 with 2 stacks of 8GB will still have my interest even if it only matches the 3080 because I want that extra VRAM for Blender without having my wallet ripped out of my face.
@@1ch1r1n HBM2 prices have remained high, as its a low volume product and all the AI, HPC and super computing companies (there is a wider ecosystem besides AMD + NV) buy it all out, it makes it a poor choice for high volume consumer products.
The tech king has spoken, other channels hold his beer.
Great video. Thanks for the clear explanations. Much more useful than the usual armchair math that leakers put out.
I've quite enjoyed the video but the sound quality is worse compared to your older videos.
Yes, my better mic got lost in a "boating accident", as the wife said.
Thanks for the information. A well thought out video.
Do you think one RDNA2 CU can match one Ampere SM in performance at similar clocks based on the new modifications? I ask because there is a school of thought that the 46SMs of the 3070 will perform better than the 52CUs of the RDNA2 in the Xbox Series X. This would mean RDNA2 is a bit poor especially when the GPU of the Series X has more CUs, VRAM and bandwidth compared to the 3070.
Series X is pretty potent, though so is the PS5. A lot of the early estimates of its true performance comes from rushed ported games that are not optimize for RDNA 1 even. It is in essence, running GCN optimized code on an RDNA 2 Series X in compatibility mode. As you should know, optimization is the key to extract peak performance. IMO, Series X has much more effective performance than a 2080S class desktop GPU.
We should see truly next-gen games on these new consoles probably a year after launch, giving devs some lead time with devkits, and also factoring in cross platform engines (many studios still have to make games for PS4/XbX currently). The first gen of games on a new console tend to be more optimized for previous consoles, with some minor improvements for the new console.
Nvidia Foreskin Works TM
Does the recent news of AMD's buffed up cache system affect this ray tracing analysis? In the end I think I'm going for the card with the least performance hit with ray tracing enabled. Looks like that might end up being AMD
If its implemented in RDNA 2, it should help with RT performance a great deal. twitter.com/nerdtechgasm/status/1313253181036490753
NerdTechGasm interesting stuff. Love your vids btw!
Heads melted. Gonna shot some gin and hope 30 mins will recharge my old matter.
Wow rich in content. Thanks
its been a long time between vids...
I'm gathering raytracing is still in it's infancy vs devs and not to many games will push it to it's max- even Nvidia's cards will burst in flames if totally pushed- Seems AMD has the burst voltage/frequency headroom and the advantage of lower voltages now readying for the next node @ 5nm
It's only been a few days since my last vid... ? ;)
They MUST NAIL THE DRIVERS! If the cards don't work properly, you never know how good it can be.
I suspect the sleepy giant Nvidia has some surprises coming when AMD beats or nearly beats their 3080 with a much smaller gpu footprint. With AMD leap frog design and development engineeing strategy AMDs navi3 will likely be a zen3 moment crushing nvidia's market share advantage.
Too complicated for me to fully understand but I got the idea. I think.
I cover more on the rendering pipeline in my Vega: Of Primitive & Pixels video. It should provide more explanations that will help you understand better.
bonjour
your video is very quiet.
I will fix the mic for the next one, cheers.
@@nerdtechgasm6502 danke shen.
;)
Good content but awful sound quality.
Thanks, I will try to fix the audio for the next videos.