You prolly dont care but does anyone know of a trick to log back into an Instagram account?? I stupidly lost my account password. I would love any help you can offer me.
@Carl Damien i really appreciate your reply. I found the site through google and I'm trying it out atm. Looks like it's gonna take a while so I will reply here later when my account password hopefully is recovered.
Very interesting! Im an undergrad comp csi major here at UMN, saw your reddit post, and love this content. I've always been a PC nerd so actually learning about this stuff is so cool. Makes me more interested in it as I progress through college. Keep up the good work!
Although not as popular, and probably not worth that much views-wise, for the few people that watch this, it's quite helpful to have a person with experience comment material in a book, in a "read and comment" fashion; it's as if the book is a person that doesn't just speak fixed, typeset words :))
Now, in mid-2023, have there been any substantial changes in the industry that might require changes to any information in this video series? Or is everything presented in this series still just as up-to-date as it was 4 years ago?
This is the best online resource I got for learning GPU architecture and related stuff. One quick question why can't CPU have many cores as GPU. I mean whats the necessity of GPU apart from the specific usage that each is designed to use for???
CPU cores are designed for a (mostly) different purpose than GPU "cores" (they're difficult to compare directly). A large fraction of the silicon in CPU cores is for mining ILP, while GPU "cores" are optimized for DLP. For example, GPUs have neither branch prediction nor OoO execution. They also have some significant differences in cache coherence (GPUs often have non-coherent L1 caches). Ultimately, GPUs are mostly designed for massively parallel applications, while CPUs are more general purpose, and have many more optimizations for single-threaded performance (not everything benefits from parallelism).
Why is SM called a core? As far as I know, SMs have several cores inside them. Are all these cores single threaded? I mean does a core work on only one thread? Btw, great content! I really appreciate the time you have put in to explain everything in detail.
The definition of "core" ends up being fairly arbitrary (the same can be said for threads which are incredibly different between CPU and GPUs). There are a few logical things that make an SM like a core. It has a private L1 cache, it's own private register file, compute resources, and has work (thread blocks) scheduled to it. CUDA cores are perhaps better referred to as execution units. They're where instruction get mapped during execution.
@@CoffeeBeforeArch Are the execution units single threaded? I am imagining the latency thing like this - when a thread has to get data, it is stalled however other threads are working simultaneously so the overall throughput is better since many operation are finishing even if one thread is stalled. Is this correct?
Abhishek Tyagi GPUs fetch warp instructions. If that instruction is a load that had to go to main memory, there is a warp scheduler that will try and fetch instructions from different warps (if available) that can make forward progress. If an instruction has already made it to an execution unit, it has the data it needs already (with the exception of loads/stores in the load store queue)
If some threads miss in the caches and are waiting on a response from DRAM, you can swap in other threads that have useful work to do instead of just stalling. One of the nice things about GPUs is that the context switching of threads is a cheap operation. This is because all the threads already have their own private sets of register in the massive register files. So when a long latency memory access occurs for one warp, a new one can immediately be swapped in to make forward progress.
It is a complete waste of time to talk about GPU without referring to the type of workload they run. The start of the video says will not talk about graphics, ok bizzare, because that is the most important workload for a GPU. Maybe you don't know the GFX rendering pipeline. So compute, start with compute shaders and what they do then go to how a GPU executes a CS. Talking about bits and pieces of the H/W units that form a GPU in isolation is utter nonsense.
So this series follows the book "General-purpose Graphics Processor Architectures", and these kinds of GPUs do not contain any GFX rendering pipelines (e.g., parts like the H100 are still called GPUs but are built solely for HPC and ML workloads, without any support for graphics)
Enjoying this at 3am on a Saturday. Keep up😁!
You prolly dont care but does anyone know of a trick to log back into an Instagram account??
I stupidly lost my account password. I would love any help you can offer me.
@Quinton Alberto instablaster :)
@Carl Damien i really appreciate your reply. I found the site through google and I'm trying it out atm.
Looks like it's gonna take a while so I will reply here later when my account password hopefully is recovered.
@Carl Damien It worked and I actually got access to my account again. I'm so happy!
Thank you so much you saved my account !
@Quinton Alberto you are welcome :)
Very interesting! Im an undergrad comp csi major here at UMN, saw your reddit post, and love this content. I've always been a PC nerd so actually learning about this stuff is so cool. Makes me more interested in it as I progress through college. Keep up the good work!
Thanks, fella! Glad you are enjoying it!
Another ECE grad major here from UMN. Loving your content! @CoffeeBeforeArch
Although not as popular, and probably not worth that much views-wise, for the few people that watch this, it's quite helpful to have a person with experience comment material in a book, in a "read and comment" fashion; it's as if the book is a person that doesn't just speak fixed, typeset words :))
This is like fine wine for those who really care about the subject
This is a gold mine. Thank you!
Amazing, I'd love to watch your series
Thanks! Glad you're interested!
Great video and content. Thank you.
Glad you think so!
This is exactly what I was looking for, Thank u for sharing.
Now, in mid-2023, have there been any substantial changes in the industry that might require changes to any information in this video series?
Or is everything presented in this series still just as up-to-date as it was 4 years ago?
Hardware architecture doesn't change as rapidly as software.
Thank you for explaining these.
Happy to help!
Thank you, very informative
awesome playlist
nice clip , thanks nick
This is the best online resource I got for learning GPU architecture and related stuff. One quick question why can't CPU have many cores as GPU. I mean whats the necessity of GPU apart from the specific usage that each is designed to use for???
CPU cores are designed for a (mostly) different purpose than GPU "cores" (they're difficult to compare directly). A large fraction of the silicon in CPU cores is for mining ILP, while GPU "cores" are optimized for DLP. For example, GPUs have neither branch prediction nor OoO execution. They also have some significant differences in cache coherence (GPUs often have non-coherent L1 caches).
Ultimately, GPUs are mostly designed for massively parallel applications, while CPUs are more general purpose, and have many more optimizations for single-threaded performance (not everything benefits from parallelism).
@@CoffeeBeforeArch thank you . That was very quick response 👏
@@podilasahithi7 Happy to help!
Is the document seen in the video accessible?
If you look at the microphone for too long, you begin to see an alien with eyes and mouth
Very awesome series! Thanks
It seems this is the only dedicated book on GPU architecture - there is a chapter in another book from 2022
many thanks
thank you for such a great videos
Why is SM called a core? As far as I know, SMs have several cores inside them. Are all these cores single threaded? I mean does a core work on only one thread?
Btw, great content! I really appreciate the time you have put in to explain everything in detail.
The definition of "core" ends up being fairly arbitrary (the same can be said for threads which are incredibly different between CPU and GPUs). There are a few logical things that make an SM like a core. It has a private L1 cache, it's own private register file, compute resources, and has work (thread blocks) scheduled to it.
CUDA cores are perhaps better referred to as execution units. They're where instruction get mapped during execution.
@@CoffeeBeforeArch Are the execution units single threaded? I am imagining the latency thing like this - when a thread has to get data, it is stalled however other threads are working simultaneously so the overall throughput is better since many operation are finishing even if one thread is stalled. Is this correct?
Abhishek Tyagi GPUs fetch warp instructions. If that instruction is a load that had to go to main memory, there is a warp scheduler that will try and fetch instructions from different warps (if available) that can make forward progress. If an instruction has already made it to an execution unit, it has the data it needs already (with the exception of loads/stores in the load store queue)
@@CoffeeBeforeArch Thanks!
This video is the missing link needed by programmers that want to expand beyond traditional CPU applications.
what program are you using to read and annotate the book? many thanks for the video as well.
Have you found out yet? I want to use this too
👍
How does multithreading hide off-chip latency?
EDIT: I tried looking online for resources to read but couldnt find any that explains this :(
If some threads miss in the caches and are waiting on a response from DRAM, you can swap in other threads that have useful work to do instead of just stalling. One of the nice things about GPUs is that the context switching of threads is a cheap operation. This is because all the threads already have their own private sets of register in the massive register files. So when a long latency memory access occurs for one warp, a new one can immediately be swapped in to make forward progress.
Getting started with bitcoin mining.
Too much names and hand-waving. Could go directly into concepts and examples instead.
30secs ads every 5min ! Wtf
saw 1 ad in total
awesome video! I just find slightly annoying how often you say "you know" but maybe it's just me ;-P
hyped to continue the series
It is a complete waste of time to talk about GPU without referring to the type of workload they run. The start of the video says will not talk about graphics, ok bizzare, because that is the most important workload for a GPU. Maybe you don't know the GFX rendering pipeline. So compute, start with compute shaders and what they do then go to how a GPU executes a CS. Talking about bits and pieces of the H/W units that form a GPU in isolation is utter nonsense.
So this series follows the book "General-purpose Graphics Processor Architectures", and these kinds of GPUs do not contain any GFX rendering pipelines (e.g., parts like the H100 are still called GPUs but are built solely for HPC and ML workloads, without any support for graphics)