For normal memory the CPU sends out the address, which then goes through the binary address generator to fetch a row. With cache we first fetch the addresses stored in the set. Then in a second step we address the cacheline (row) in a second memory bank. So cache needs two cycles. Therefore, N64 uses TMEM, not cache. Atari Jaguar uses scratchpad memory (on chip). And even here an access has 3 cycles latency ( vs 2 to the register file ). Atari calls this extremely fast memory It would be faster to use large registers. The Jaguar can load 64 bit at once. ARM uses a barrel shifter in every instruction to select packed bit fields. Likewise the CPU should just fetch very long instruction words. So long that it can contain REP SCANS and the 68k instructions of the same kind.
@ this was not meant as a criticism of the video, just a hint to follow. A lot of CS seems to be detached from the electric side. I think this was already visible in the 80s were Intel, WDC, and Motorola ran at higher clock speeds than the fabless RISC CPUs from theoretical CS , which offered incompatibility for an insane price .
Hope more people find your channel like i find today, its a very good video with a great explanation. if you accept any suggestion, please apply this visualization methodology to explain some algorithms problems, for example, the dining philosophers problem in C. Thanks, i would love to see
Thank you! Of course I take suggestions. These types of videos take a lot of time to make, but I do want to touch on interesting algorithm problems, and will get there eventually.
Sir, you're doing a great job here. As a certified Digital Marketer, I strongly believe you should leverage on Google Ads(RUclips Ads). Soon, people would flood ur channel
Great content! Keep up the good work In next step, how does multi-core CPU handles their cache state and sync between them. That’s a part I never understand well enough
Thanks! The sync between caches is done using the MESIF/MOESI protocols (or other derivatives). It's on my ToDo list and I will dedicate a whole video to it.
It should be mentioned here that if this is direct mapped cache then each RAM address is mapped to the cache. But since the RAM is bigger than the cache, there are multiple RAM addresses ( RAM bytes/cache bytes ) which will be mapped to the same cache address. That is why, even though the cache is 32kb , 36 bits are used because if a RAM memory address which is not there in the cache tries to access the cache there should be a cache miss.
If you create additional objects how do you keep them aligned in memory? Allocate bigger chunk? But if that chunk is filled up is there anything that can be done?
Take Vector in C++ STL as an example: you create objects in a small pre-allocated chunk of memory. When that memory is full, you allocate a bigger chunk (let's say twice the size you have right now), copy all existing objects to the new location, release the old memory and continue creating objects in the new location. This way objects are always stored next to each other in memory. Unfortunately, you have to pay the performance price when copying the data every time you allocate new memory.
When I studied about computer architecture, they always taught that CPU talks to RAM directly not much about cache. Only when I became professional only then I realized multiple levels of cache before RAM. I understand the importance of caching but I wonder how much is overhead on CPU when cache miss happens at all cache levels. Wouldn't it be faster to directly fetch from RAM instead of looking through multiple levels of cache?
There is a very specific formula to figure out the Average Memory Access Time for a cache hierarchy. To break even (RAM access is equal to cache access on average), you need a certain percent of cache hits. This percent is relatively low (probably low teens) so anything above that will justify cache access.
You don't manage cache. That's done by the CPU. You can assist it though by taking advantage of temporality and locality by allocating and accessing your memory intelligently.
What do you mean by cache resources? The cache is completely controlled by hardware. Your choice of language won't have any direct impact on the behavior of the cache
Languages like C/C++/Assembly give you the ability to control how the data in your program is arranged in memory. If you do it smart, it might increase the efficiency of the cache. I think threads can have an impact on cache, especially in situations where you run completely different code on different threads
Clarifying question: you keep mentioning that memory is slow, but is this just in comparison to how fast the cache can respond? RAM is supposed to be one of the fastest components, so it confused me a bit when you kept saying it was slow.
Sorry if I wasn't clear. You are right, when I say RAM is slow, it is always relative to cache. You could also say that RAM is really fast, compared to hard disks.
This was such an insightful video! I really enjoyed the part about locality. It reminded me of a video I recently made on VPC, where I dive deeper into core concept of VPC and its implementation in AWS. If anyone’s interested, feel free to check it out - I’d love to hear your thoughts!
0:59 factorio, one of the most optimized video games, uses OOP and inheritance, and most of the data belong to an object is stored in one place. ECS and DOD are overrated techniques that limit creativity.
On the other hand, Minecraft uses an incredibly efficient ECS (EnTT) to store and manage it's million-object maps. Anyway, I wasn't trying to paint an accurate picture of how games organize data, just to highlight the difference between memory layouts and how it relates to the cache
please continue, I've been searching for ages to find this amazing content
I invest a lot of time researching and making the animations so each video takes a couple of weeks to make. Please be patient
Wtf bro how the hell is this channel so underrated??? Dude keep doing what you're doing. It's absolutely breathtakingly brilliant
Thanks! I put a lot of effort into making the animations. I hope these videos are helpful to people.
@BitLemonSoftware keep it up mate.. it's amazing
There’s no need to swear. We’re all on your side here.
that's just how youtube work's not everyone is blessed by algorithms and also this niche is not so watchable and entertaining but very educational
This is one of the best explanations I've seen. I'll have to rewatch this video to fully grasp the content of it, thank you so much for making this!
You are welcome!
Great video! Keep it up, my class teacher from Brazil has just shared this video through the group class.
Wow, I didn't realize I was making such an impact
here before this channel blows up, very informative and beautiful man, keep it up
Thanks! Hopefully it will reach more people
Detailed video shows lots of efforts went into research and animation great work
Thanks!
For normal memory the CPU sends out the address, which then goes through the binary address generator to fetch a row.
With cache we first fetch the addresses stored in the set. Then in a second step we address the cacheline (row) in a second memory bank. So cache needs two cycles.
Therefore, N64 uses TMEM, not cache. Atari Jaguar uses scratchpad memory (on chip). And even here an access has 3 cycles latency ( vs 2 to the register file ). Atari calls this extremely fast memory
It would be faster to use large registers. The Jaguar can load 64 bit at once. ARM uses a barrel shifter in every instruction to select packed bit fields.
Likewise the CPU should just fetch very long instruction words. So long that it can contain REP SCANS and the 68k instructions of the same kind.
It is all true, but to much detailed for such level.
@ this was not meant as a criticism of the video, just a hint to follow. A lot of CS seems to be detached from the electric side. I think this was already visible in the 80s were Intel, WDC, and Motorola ran at higher clock speeds than the fabless RISC CPUs from theoretical CS , which offered incompatibility for an insane price .
keep it up man this is gonna blow up!
Thanks for the support!
Keep it up, loved the video and graphics!
Thanks! I appreciate the support
I am pretty sure this channel is gonna blow up very soon, I love your content!
I'm glad you like it. Hopefully it will reach more people
I'm your 293rd subscriber. This channel is criminally underrated
Thanks for the support! The channel is still young, hopefully it will grow soon.
789 🫡
823!!!!
929
2349 ... growing fast :D
This was amazing, never heard it explained so well.
I am glad it was helpful!
older games used to do all of these things, its not that modern games are discovering this stuff they're returning to it.
Yeah I agree with the other commenters, your channel is underrated.
Let me fix that...liked and subbed!
Thank you for the support!
same, great video
This is really amazing. Thank you for the great work, i hope you have fun making such content because it is fun for us to watch it.
Thanks! I do enjoy making these videos and I hope they actually help people and I'm not just wasting time
Keep up bro .Great video ❤.
Thanks!
Very insightful content. keep going man - your 739 subscriber.
Thanks!
Damn this was really great. Please do more!
Thanks! I will
Hope more people find your channel like i find today, its a very good video with a great explanation. if you accept any suggestion, please apply this visualization methodology to explain some algorithms problems, for example, the dining philosophers problem in C. Thanks, i would love to see
Thank you! Of course I take suggestions. These types of videos take a lot of time to make, but I do want to touch on interesting algorithm problems, and will get there eventually.
Great explanation!
Thanks!
Awesome video! I’m subscribing right now.
Welcome aboard
Sir, you're doing a great job here. As a certified Digital Marketer, I strongly believe you should leverage on Google Ads(RUclips Ads). Soon, people would flood ur channel
Thanks!
I only skimmed, but it looks very good, nice, clear explanations.
great explanation
Thanks
Keep growing bro ❤❤❤
This high-quality channel has not yet reached 1k subscribers.
Indeed. Does that surprise you?
Hey, great video! Liked it and subscribed to your channel
Thanks! I really appreciate the support
Sometimes i love RUclips recommendations ❤
I'll take it as a compliment 😁
Great content! Keep up the good work
In next step, how does multi-core CPU handles their cache state and sync between them. That’s a part I never understand well enough
Thanks! The sync between caches is done using the MESIF/MOESI protocols (or other derivatives).
It's on my ToDo list and I will dedicate a whole video to it.
בחמש שניות הראשון ידעתי שאתה ישראלי יתותח על
Well done, thanks!
Glad it was helpful!
It should be mentioned here that if this is direct mapped cache then each RAM address is mapped to the cache. But since the RAM is bigger than the cache, there are multiple RAM addresses ( RAM bytes/cache bytes ) which will be mapped to the same cache address. That is why, even though the cache is 32kb , 36 bits are used because if a RAM memory address which is not there in the cache tries to access the cache there should be a cache miss.
Nice video, you should show code as example too in videos
Thanks! Sure I'll consider adding some code examples in the future
Underrated channel
Thanks for the support! The channel is still young and hopefully it will grow soon.
Nice video :)
Thanks!
Best
I am your 423rd subscriber ❤
Welcome. I hope my videos are helpful
If you create additional objects how do you keep them aligned in memory? Allocate bigger chunk? But if that chunk is filled up is there anything that can be done?
Take Vector in C++ STL as an example: you create objects in a small pre-allocated chunk of memory. When that memory is full, you allocate a bigger chunk (let's say twice the size you have right now), copy all existing objects to the new location, release the old memory and continue creating objects in the new location.
This way objects are always stored next to each other in memory. Unfortunately, you have to pay the performance price when copying the data every time you allocate new memory.
@BitLemonSoftware Thank you very much! I see how it's done now.
Thank you
My pleasure!
When I studied about computer architecture, they always taught that CPU talks to RAM directly not much about cache. Only when I became professional only then I realized multiple levels of cache before RAM. I understand the importance of caching but I wonder how much is overhead on CPU when cache miss happens at all cache levels. Wouldn't it be faster to directly fetch from RAM instead of looking through multiple levels of cache?
There is a very specific formula to figure out the Average Memory Access Time for a cache hierarchy.
To break even (RAM access is equal to cache access on average), you need a certain percent of cache hits. This percent is relatively low (probably low teens) so anything above that will justify cache access.
Please, rearrange the CPU cache playlist in chronological order
Done! Thanks for pointing that out to me
u should have 1M subs.
Maybe some day
Subbed. Any chance of giving some recommended sources related?
Nice🎉
Thanks!
What are the best low level programming languages to manage chache resources? Trying to see if Rush would come up in that list
You don't manage cache. That's done by the CPU. You can assist it though by taking advantage of temporality and locality by allocating and accessing your memory intelligently.
What do you mean by cache resources? The cache is completely controlled by hardware. Your choice of language won't have any direct impact on the behavior of the cache
Ok I see, so is it mostly memory/thread control from any low level code?
Languages like C/C++/Assembly give you the ability to control how the data in your program is arranged in memory. If you do it smart, it might increase the efficiency of the cache.
I think threads can have an impact on cache, especially in situations where you run completely different code on different threads
Clarifying question: you keep mentioning that memory is slow, but is this just in comparison to how fast the cache can respond? RAM is supposed to be one of the fastest components, so it confused me a bit when you kept saying it was slow.
Sorry if I wasn't clear. You are right, when I say RAM is slow, it is always relative to cache.
You could also say that RAM is really fast, compared to hard disks.
Could eniminate number of cores if increase cpu cache?
This was such an insightful video! I really enjoyed the part about locality. It reminded me of a video I recently made on VPC, where I dive deeper into core concept of VPC and its implementation in AWS. If anyone’s interested, feel free to check it out - I’d love to hear your thoughts!
Thanks @BitLemonSoftware
@6:51 it should be 2^36 bits, not bytes.
64GB is equal to 2^36 Bytes, which are represented by 36 bits
@@BitLemonSoftware You're right, i looked at that too quickly ^^.
How do I have a cs degree and have never herd of DOD
I think CS degrees often focus on OOP because it's more intuitive
0:59 factorio, one of the most optimized video games, uses OOP and inheritance, and most of the data belong to an object is stored in one place. ECS and DOD are overrated techniques that limit creativity.
On the other hand, Minecraft uses an incredibly efficient ECS (EnTT) to store and manage it's million-object maps.
Anyway, I wasn't trying to paint an accurate picture of how games organize data, just to highlight the difference between memory layouts and how it relates to the cache
ur accent feels like your indian
Ummm... Not even a little bit. It's actually russian
@BitLemonSoftware пхпхпххпп, бывает