Bradon Fredrickson
Bradon Fredrickson
  • Видео 3
  • Просмотров 92 592
God of War_20210417235019
God of War
Last Wish (GOLD)
Spread the ashes #PlayStationTrophy
Просмотров: 65

Видео

God of War_20210417235019
Просмотров 333 года назад
God of War Last Wish (GOLD) Spread the ashes #PlayStationTrophy
Nvidia GPU Architecture
Просмотров 93 тыс.9 лет назад
This video is about Nvidia GPU architecture. This is my final project for my computer architecture class at community college.

Комментарии

  • @kompila
    @kompila Месяц назад

    7 years later this is still legendary

  • @aravinds123
    @aravinds123 2 месяца назад

    Thanks a lot for this explanation! This is the best video that I could find on this topic!!

  • @Engrbilal143
    @Engrbilal143 7 месяцев назад

    8 years and still its the best video available that explains gpu architecture

  • @jayashreebhargava2348
    @jayashreebhargava2348 8 месяцев назад

    Very nice 👍 overview of Nivdia GPU arch

  • @klam77
    @klam77 9 месяцев назад

    VERY useful! thanks.

  • @pubgplayer1720
    @pubgplayer1720 Год назад

    This is a nice high level view of architecture without going into too much detail.

  • @shreyabhandare6056
    @shreyabhandare6056 Год назад

    only video that explains this topic well, please consider making more on newer stuff, thank you 🙏

  • @antonidaweber9184
    @antonidaweber9184 Год назад

    Thank you very much for your work, Brandon! This video contains a lot of useful information. Explanations are simple and concise. And I also find your approach of researching information very inspiring. Step by step you dive into this topic, and although it's hard, in the end you have well-stuctured presentation, that you kindly share with other people.

    • @bradonf333
      @bradonf333 Год назад

      Thank you! I really appreciate that. Glad my video could help.

  • @zienabesam4339
    @zienabesam4339 Год назад

    I liked the way you explain 👍

  • @kartik8n8
    @kartik8n8 2 года назад

    I have come to this video 7 years since it's made, but the only source that's explained the GPU architecture so well. We need more people like you Brandon. Thanks for this!

    • @bradonf333
      @bradonf333 2 года назад

      Thanks! I appreciate that. Happy i could help! 👍

  • @anphiano4775
    @anphiano4775 3 года назад

    Thank you Brandon, when you explained where's the number of cuda cores comes from, I felt it's so interesting, and I calculated it right away of how many SMMs in my GTX 960 with 1024 cuda cores

  • @kartikpodugu
    @kartikpodugu 3 года назад

    Hierarchy 1. GPU 2. GPC 3. SMM 4. CUDA core.

  • @paritoshgavali
    @paritoshgavali 3 года назад

    Very well explained thanks

  • @sunwrighttrainingschool8138
    @sunwrighttrainingschool8138 4 года назад

    can you share the ppt to me? alan.wang2121@gmail.com

  • @sunwrighttrainingschool8138
    @sunwrighttrainingschool8138 4 года назад

    can you share the ppt to me? alan.wang2121@gmail.com

  • @sunwrighttrainingschool8138
    @sunwrighttrainingschool8138 4 года назад

    can you share the ppt to me? alan.wang2121@gmail.com

  • @sunwrighttrainingschool8138
    @sunwrighttrainingschool8138 4 года назад

    vcan you share the ppt to me? alan.wang2121@gmail.com

  • @sunwrighttrainingschool8138
    @sunwrighttrainingschool8138 4 года назад

    can you share the ppt to me? alan.wang2121@gmail.com

  • @sunwrighttrainingschool8138
    @sunwrighttrainingschool8138 4 года назад

    can you share the ppt to me? alan.wang2121@gmail.com

  • @sunwrighttrainingschool8138
    @sunwrighttrainingschool8138 4 года назад

    can you share the ppt to me? alan.wang2121@gmail.com

  • @sunwrighttrainingschool8138
    @sunwrighttrainingschool8138 4 года назад

    can you share the ppt to me? alan.wang2121@gmail.com

  • @sunwrighttrainingschool8138
    @sunwrighttrainingschool8138 4 года назад

    can you share the ppt to me? alan.wang2121@gmail.com

  • @sunwrighttrainingschool8138
    @sunwrighttrainingschool8138 4 года назад

    can you share the ppt to me? alan.wang2121@gmail.com

  • @sunwrighttrainingschool8138
    @sunwrighttrainingschool8138 4 года назад

    can you share the ppt to me? alan.wang2121@gmail.com

  • @sunwrighttrainingschool8138
    @sunwrighttrainingschool8138 4 года назад

    can you share the ppt to me? alan.wang2121@gmail.com

  • @sunwrighttrainingschool8138
    @sunwrighttrainingschool8138 4 года назад

    can you share the ppt to me? alan.wang2121@gmail.com

  • @sunwrighttrainingschool8138
    @sunwrighttrainingschool8138 4 года назад

    can you share the ppt to me? alan.wang2121@gmail.com

  • @sunwrighttrainingschool8138
    @sunwrighttrainingschool8138 4 года назад

    can you share the ppt to me? alan.wang2121@gmail.com

  • @MrPkmonster
    @MrPkmonster 4 года назад

    Great tutorial. Very clear explanation. Thank you Bradon

  • @PlanViews
    @PlanViews 4 года назад

    Architectural animation software ruclips.net/video/4U3MAR8xyfs/видео.html

  • @zhangbo0037
    @zhangbo0037 4 года назад

    very helpful for learning Graphics thanks

  • @stizandelasage
    @stizandelasage 4 года назад

    this can accelerate my Unix os I really like it

  • @Rowing-li6jt
    @Rowing-li6jt 5 лет назад

    Great video!!!

  • @billoddy5637
    @billoddy5637 5 лет назад

    As you can probably see, Streaming Multiprocessors are the GPU’s equivalent of a CPU core. Furthermore, these CUDA "cores" Nvidia refers to are actually execution units. Floating-point FMAs to be precise. They make up the bulk of the SM’s execution units. In reality, the number of SMs doesn’t really matter: Nvidia tends to change the size of an SM between microarchitectures, so comparison isn’t really useful. I would say comparing the number of CUDA cores as well as the clock speed is probably a more reliable comparison.

    • @hjups
      @hjups 3 года назад

      That depends on your workload. If you have a lot of thread divergence, more SMs with fewer ALUs is better. If you have high utilization and less thread divergence, then fewer SMs with more ALUs would be better. As GPU tasks become more complicated, the amount of divergence increases, so making the SMs smaller is a better approach. The raw peak performance only depends on the number of ALUs and clock speed (not the SMs). However, the maximum practical peak performance will highly depend on the number of SMs depending on the workload. I.e. 1 SM with 2048 ALUs at 1 GHz might do 2 TFLOPS. But practically only be able to achieve 40% peak. Whereas 4 SMs with 256 ALUs each might peak out at 1 TFLOPs and be able to achieve 90% peak. 40% vs 90% could be explained completely via thread divergence. Then you have 2*.4=0.8 TFLOPS vs 1*0.9=0.9 TFLOPS. That's an extreme example though, because they are a lot closer in base numbers while being further apart in performance (it has to do with register file pressure, scheduling, work distribution, etc.)

  • @petergibson2318
    @petergibson2318 6 лет назад

    I like this level....down to the hardware. Easy -Peezy software like Facebook and Microsoft Office are sitting on top of hardware. The cleverness is in the hardware....billions of transistors.

  • @Supperesed
    @Supperesed 6 лет назад

    Microarchitecture sounds like organization according to you

  • @evabasis4960
    @evabasis4960 6 лет назад

    thaink you for the nice video. Can a core runs more than onr thread in the same time? Or a cuca core is to execute only one thread in the same time?

  • @himanshupatra1991
    @himanshupatra1991 7 лет назад

    very nice explained. Can I get the link to all sequential videos after or before this..? I couldn't find the prev one. I wanna watch all of those. @Bradon Fredrickson

    • @bradonf333
      @bradonf333 7 лет назад

      Himanshu patra Hey, sorry I don't have any more videos. This was just a final project I had to do for school.

    • @himanshupatra1991
      @himanshupatra1991 7 лет назад

      Bradon Fredrickson thank you somuch for the quick replay. I thought of asking because some where in the video you told "I have explained CUBA in the prev class". It is such a nice video. Thank you so much. 😊.

  • @SHIVAMPANDEY-rr8in
    @SHIVAMPANDEY-rr8in 7 лет назад

    how can we relate wraps and grids??

  • @SHIVAMPANDEY-rr8in
    @SHIVAMPANDEY-rr8in 7 лет назад

    great!!

  • @JJJohnson441
    @JJJohnson441 7 лет назад

    Thanks for this simple, but informative tutorial.

  • @zhikangdeng3619
    @zhikangdeng3619 7 лет назад

    really nice explanation. Thanks for you sharing!

  • @223Warlord
    @223Warlord 7 лет назад

    Pretty sure their gpus are more complicated than what you can read up o Wikipedia, otherwise all companies can easily steal nividia's intellectual property.

  • @vladislavdracula1763
    @vladislavdracula1763 8 лет назад

    Very well explained. However, cores are not the same as ALUs.

    • @ithaca2076
      @ithaca2076 3 года назад

      True, but they do similar tasks. Albeit, the GPUs cores are more like lots and lots of advanced ALUs with a few special bells and whistles here and there

  • @breezysaint9539
    @breezysaint9539 8 лет назад

    Well explained! thank you

  • @FreakinLobstah
    @FreakinLobstah 8 лет назад

    Wow, very well done! It was very helpful for me.

  • @Varaquilex
    @Varaquilex 8 лет назад

    Is there any difference in the CPU ALUs and GPU ALUs?

    • @bradonf333
      @bradonf333 8 лет назад

      I think the main difference is the number of ALU's. GPU's have a lot of ALU's and a CPU only had a few.

    • @ithaca2076
      @ithaca2076 3 года назад

      @@bradonf333 well, they aren't the same though. GPU ALUs are a bit more advanced, and tend to have features like how to calculate shading, if I recall correctly

    • @hjups
      @hjups 3 года назад

      ​@@ithaca2076 That's not quite correct, depending on what you mean. CPUs don't typically have one type of ALU anymore, they have an integer ALU (add, subtract, logic, shifts), a multiplier, a divider, and a FP ALU (which is often divided into a FP ADD/MUL ALU and a FP DIV / SQRT etc ALU). Often those ALUs are combined into common pipelines, for example, an x86 CPU may do {Add, Subtract, Logic, Shifts, and multiplication} in one ALU, and then {Add, Subtract, logic, and division} in another ALU. It also depends on the CPU, some of them have multiply-accumulate instructions, while others don't. If you ignore the addition of a MAC instruction, then a GPU "ALU" is going to be much simpler than most CPU ALUs. The degree to which that's true depends on the GPU architecture, the older ones combined an INT32 and FP32 ALU into a single "core", which may or may not have been unified (i.e. a single pipeline that could do either), or they could have been unique pipelines. The advantage to unique pipelines would be that the latency is lower (fewer cycles) at the cost of area. The current NVidia architectures have a combined FP32/INT32 pipeline, and a FP32 only pipeline. The current AMD architectures have combined FP32 and INT32 pipelines, which is true for the newer ARM Mali GPUs, as well as PowerVR, and Apple's GPUs. Going back to NVidia though, the FP32 ALUs can only do FPADD, FPSUB, FPMUL, FPMAC, FMA, and I think that's also were the Int to FP instructions are done. The INT32 ALUs do integer ADD, SUB, MUL, MAC, Logic, Shift, FPCompare, and I think is where the FP to Int instructions are done. The "Cores" / ALUs don't do anything fancier like SQRT, or Division. The GPU is actually incapable of doing either of those operations, and instead can do approximations to those operations using the Special Function Units (SFU / MFU). Those would not be considered ALUs though. TL;DR The GPU ALUs are in general much simpler than that of a modern (high end) CPU.

  • @8scorpionx
    @8scorpionx 9 лет назад

    Very interesting,thumbs up :)

  • @IgorAherne
    @IgorAherne 9 лет назад

    Thanks!