- Видео 3
- Просмотров 92 592
Bradon Fredrickson
Добавлен 28 июл 2011
God of War_20210417235019
God of War
Last Wish (GOLD)
Spread the ashes #PlayStationTrophy
Last Wish (GOLD)
Spread the ashes #PlayStationTrophy
Просмотров: 65
Видео
God of War_20210417235019
Просмотров 333 года назад
God of War Last Wish (GOLD) Spread the ashes #PlayStationTrophy
Nvidia GPU Architecture
Просмотров 93 тыс.9 лет назад
This video is about Nvidia GPU architecture. This is my final project for my computer architecture class at community college.
7 years later this is still legendary
Thank you!
Thanks a lot for this explanation! This is the best video that I could find on this topic!!
8 years and still its the best video available that explains gpu architecture
Totally agree 👍
Very nice 👍 overview of Nivdia GPU arch
VERY useful! thanks.
This is a nice high level view of architecture without going into too much detail.
only video that explains this topic well, please consider making more on newer stuff, thank you 🙏
Thanks so much!!
Thank you very much for your work, Brandon! This video contains a lot of useful information. Explanations are simple and concise. And I also find your approach of researching information very inspiring. Step by step you dive into this topic, and although it's hard, in the end you have well-stuctured presentation, that you kindly share with other people.
Thank you! I really appreciate that. Glad my video could help.
I liked the way you explain 👍
I have come to this video 7 years since it's made, but the only source that's explained the GPU architecture so well. We need more people like you Brandon. Thanks for this!
Thanks! I appreciate that. Happy i could help! 👍
Thank you Brandon, when you explained where's the number of cuda cores comes from, I felt it's so interesting, and I calculated it right away of how many SMMs in my GTX 960 with 1024 cuda cores
Hierarchy 1. GPU 2. GPC 3. SMM 4. CUDA core.
Very well explained thanks
can you share the ppt to me? alan.wang2121@gmail.com
can you share the ppt to me? alan.wang2121@gmail.com
can you share the ppt to me? alan.wang2121@gmail.com
vcan you share the ppt to me? alan.wang2121@gmail.com
can you share the ppt to me? alan.wang2121@gmail.com
can you share the ppt to me? alan.wang2121@gmail.com
can you share the ppt to me? alan.wang2121@gmail.com
can you share the ppt to me? alan.wang2121@gmail.com
can you share the ppt to me? alan.wang2121@gmail.com
can you share the ppt to me? alan.wang2121@gmail.com
can you share the ppt to me? alan.wang2121@gmail.com
can you share the ppt to me? alan.wang2121@gmail.com
can you share the ppt to me? alan.wang2121@gmail.com
can you share the ppt to me? alan.wang2121@gmail.com
can you share the ppt to me? alan.wang2121@gmail.com
Great tutorial. Very clear explanation. Thank you Bradon
Architectural animation software ruclips.net/video/4U3MAR8xyfs/видео.html
very helpful for learning Graphics thanks
this can accelerate my Unix os I really like it
Great video!!!
As you can probably see, Streaming Multiprocessors are the GPU’s equivalent of a CPU core. Furthermore, these CUDA "cores" Nvidia refers to are actually execution units. Floating-point FMAs to be precise. They make up the bulk of the SM’s execution units. In reality, the number of SMs doesn’t really matter: Nvidia tends to change the size of an SM between microarchitectures, so comparison isn’t really useful. I would say comparing the number of CUDA cores as well as the clock speed is probably a more reliable comparison.
That depends on your workload. If you have a lot of thread divergence, more SMs with fewer ALUs is better. If you have high utilization and less thread divergence, then fewer SMs with more ALUs would be better. As GPU tasks become more complicated, the amount of divergence increases, so making the SMs smaller is a better approach. The raw peak performance only depends on the number of ALUs and clock speed (not the SMs). However, the maximum practical peak performance will highly depend on the number of SMs depending on the workload. I.e. 1 SM with 2048 ALUs at 1 GHz might do 2 TFLOPS. But practically only be able to achieve 40% peak. Whereas 4 SMs with 256 ALUs each might peak out at 1 TFLOPs and be able to achieve 90% peak. 40% vs 90% could be explained completely via thread divergence. Then you have 2*.4=0.8 TFLOPS vs 1*0.9=0.9 TFLOPS. That's an extreme example though, because they are a lot closer in base numbers while being further apart in performance (it has to do with register file pressure, scheduling, work distribution, etc.)
I like this level....down to the hardware. Easy -Peezy software like Facebook and Microsoft Office are sitting on top of hardware. The cleverness is in the hardware....billions of transistors.
更==
Microarchitecture sounds like organization according to you
thaink you for the nice video. Can a core runs more than onr thread in the same time? Or a cuca core is to execute only one thread in the same time?
very nice explained. Can I get the link to all sequential videos after or before this..? I couldn't find the prev one. I wanna watch all of those. @Bradon Fredrickson
Himanshu patra Hey, sorry I don't have any more videos. This was just a final project I had to do for school.
Bradon Fredrickson thank you somuch for the quick replay. I thought of asking because some where in the video you told "I have explained CUBA in the prev class". It is such a nice video. Thank you so much. 😊.
how can we relate wraps and grids??
great!!
Thanks for this simple, but informative tutorial.
really nice explanation. Thanks for you sharing!
Pretty sure their gpus are more complicated than what you can read up o Wikipedia, otherwise all companies can easily steal nividia's intellectual property.
Very well explained. However, cores are not the same as ALUs.
True, but they do similar tasks. Albeit, the GPUs cores are more like lots and lots of advanced ALUs with a few special bells and whistles here and there
Well explained! thank you
Thank you
Wow, very well done! It was very helpful for me.
Thanks!!
Is there any difference in the CPU ALUs and GPU ALUs?
I think the main difference is the number of ALU's. GPU's have a lot of ALU's and a CPU only had a few.
@@bradonf333 well, they aren't the same though. GPU ALUs are a bit more advanced, and tend to have features like how to calculate shading, if I recall correctly
@@ithaca2076 That's not quite correct, depending on what you mean. CPUs don't typically have one type of ALU anymore, they have an integer ALU (add, subtract, logic, shifts), a multiplier, a divider, and a FP ALU (which is often divided into a FP ADD/MUL ALU and a FP DIV / SQRT etc ALU). Often those ALUs are combined into common pipelines, for example, an x86 CPU may do {Add, Subtract, Logic, Shifts, and multiplication} in one ALU, and then {Add, Subtract, logic, and division} in another ALU. It also depends on the CPU, some of them have multiply-accumulate instructions, while others don't. If you ignore the addition of a MAC instruction, then a GPU "ALU" is going to be much simpler than most CPU ALUs. The degree to which that's true depends on the GPU architecture, the older ones combined an INT32 and FP32 ALU into a single "core", which may or may not have been unified (i.e. a single pipeline that could do either), or they could have been unique pipelines. The advantage to unique pipelines would be that the latency is lower (fewer cycles) at the cost of area. The current NVidia architectures have a combined FP32/INT32 pipeline, and a FP32 only pipeline. The current AMD architectures have combined FP32 and INT32 pipelines, which is true for the newer ARM Mali GPUs, as well as PowerVR, and Apple's GPUs. Going back to NVidia though, the FP32 ALUs can only do FPADD, FPSUB, FPMUL, FPMAC, FMA, and I think that's also were the Int to FP instructions are done. The INT32 ALUs do integer ADD, SUB, MUL, MAC, Logic, Shift, FPCompare, and I think is where the FP to Int instructions are done. The "Cores" / ALUs don't do anything fancier like SQRT, or Division. The GPU is actually incapable of doing either of those operations, and instead can do approximations to those operations using the Special Function Units (SFU / MFU). Those would not be considered ALUs though. TL;DR The GPU ALUs are in general much simpler than that of a modern (high end) CPU.
Very interesting,thumbs up :)
Thanks!