@@oskar_schramm Consider copypasting this list into the video's description so that RUclips would automatically form them into chapters; it's a handy feature lol
Clicked on it out of curiosity expecting some rudimentary and well-trodden stuff, like all the videos saying "why 'game name x' is slow and how we can fix it" and it goes over highly basic stuff like culling, but you did go over a ton of various techniques. It's refreshing to see discussion on some nuances once in a while. Like when I shipped a game on consoles that targeted strict 60 FPS and occlusion culling actually worsened the overall performance, so we didn't include it : )
Maybe one slight comment: branches aren't really a problem nowadays. Sort of. By themselves, branch code execution came a long way on GPU and can be scheduled much more elegantly by the GPU scheduler, unlocking many more optimization opportunities and techniques along the way. The main problem with branches is still data dependency, as usually the algorithms that the user implements with branch usages are very dependent on data that cannot be fetched quickly. This in turn can further worsen performance with data which loading depends on branch outcomes if you're not careful enough. There's additional necessity to suspend threads and hide their latency because of that, and the GPUs can only do it so much before they hit massive pipeline stalls.
Lovely video! Though heads up, for RUclips's "chapters" to work properly, it seems you need to have 0:00 marked in the description as well - I'd maybe label it "Intro" and put it right before your 0:30 timestamp.
Nanite can give you worse frame rate, because it overdraws if you don’t know how to use it and if your model topology is complex it’s pretty hard to optimize and it’ll kill frame rate, so for big game it’s framekilling, it’s all of the other optimization that are tanking for nanite
I think this explains it pretty well. Although we have to realize that nanite isn’t theoretically a harmful thing, else it would not have been made. It solves a lot of problem, but just as with any other type advanced tech, it’s important to use it correctly, and if not, yes it can harm performance. Same is true with ex z-prepass or occlusion culling, which is implemented by 90% of the industry, but if used incorrectly, can also hurt performance. Acceleration structures almost always have an overhead before it gives results. It’s not just a checkbox for frames.
Great video! I think we can mention shader programs optimizations too: - cycles - branching - per vertex/per fragment calculation trade off - texture samplers and bindless textures Also hardware specific optimizations like for mobiles(tile architecture) we have - hidden surface removal - optimized MAD, dot, saturate shader instructions - cheap MSAA on write only frame buffers
Glad you liked it. Absolutely great additions, thanks! It's just too hard to list them all in one video :D I was thinking of making a video in the future about specifically shader optimization, so will take this in mind then!
@@No1001-w8m Do you have an example? If you mean the depht of each technique, I can make follow up videos for that, but if i were to do 50 techniques in depth in 1 video, that would be 1hour+
Note, sparse virtual textures and megatextures are the same thing. also, shader branching is complicated. Branches are actually very cheap, what's expensive is divergence across a wave. That is - if neighbouring pixels all take the same branch - it's cheap, if they take different branches - it's expensive.
Thanks for letting me know, yeah realized the mega texture part a week or 2 after the video. About branching, you mentioned pixels but I guess the same is true for any code that uses the cuda cores? So if a thread in a warp takes a different path than the others, they cannot run in lockstep right? And fragment shaders are run in a 2x2 thread block, is that 4 threads, or a warp sharing that 2x2 block, if you know?
@@oskar_schramm The concept of a warp or whatever roughly exists because of hardware mapping. In hardware a bunch of cores share a large piece of cache, so no matter if it's cuda, pure compute shaders or fragment shaders - they share a larger than 2x2 (4) group. Actual sizes will vary depending on architecture, even between generations of the same architecture things change sometimes. But yeah, you got it exactly right - if there is divergence both paths will be executed more or less, so the timing will be worse than max(x,y), it will be x+y roughly. You typically want as little divergence as possible anyway, because divergence slows everything down, not just branches, your texture caches get hammered as well. A good recent example would be ray tracing, pretty much every RTX api today will sort rays before dispatch to reduce divergence, because every with the cost of sorting you tend to get ~20% performance boost due to better data locality. If you're interested, there's a lot of literature out there on the topic, "Ray Tracing Gems" has some introduction on it. The actual sorting is done based on direction and origin, rays that originate close together are more likely to hit the same data, and rays that go in the same direction share the same property. Regarding performance, I would recommend looking into the following topics as well: 1. Temporal and Spatial integration (TAA is an example of that, another would be straight-up upscaling) 2. Meshlet shading. You brought up nanite, which used meshlets (they call them clusters) 3. Variable-rate shading 4. Texture compression. This is something that most platforms will handle for you, but if we're talking about graphics engineering - it's a very valuable technique. Hardware supports texture compression out of the box, and it's actually faster than non-compressed access, as there's less data per pixel. 5. Texture streaming. The idea is to only load MIPs that are currently needed, and skip the full resolution levels, this makes load times much faster and helps eliminate a lot of FPS spikes when you need to load some new textures as we rarely actually need the full 4k or 8k textures since we're looking at things from far away 6. Frame graphs. These are standard today as well, there's a good talk from EA I think from over a decade ago. Both Unity and Unreal use frame graphs, and so do most modern engines. The reason is resource utilization and pipeline flexibility. You can use waaay less memory by reusing render targets between passes and your code become more modular. There's an extra perf benefit as you're more likely to have the render target in your caches already when starting a pass. There's a lot more, I suggest checking out "Advances in Real-time Graphics" section of SIGGRAPH, they have a separate course every year with industry presentations from companies like Epic and EA, lots of truly amazing stuff and it's all in open access.
These are lovely technical optimization techniques, but we also have to take texture usage into account, like using trim sheets, and model stuff in a modular way to be repetitive but also full of variation and customizability, to allow instancing, and all culling types.
It's interesting how a game like Battlefield 1 from DICE, can be so pretty and yet so optimized that it can run on very simple hardware, without stutters, graphical glitches and no perfomance issues as in frame drops or frametime issues. Then you go and look at CS2, which is yet to be optimized, and where as Battlefield 1 can easily run with 62 players in a match, CS2 struggles with 10 of them in a Match. Also not to say that CS2's perfomance has only been worse after subsequent updates. So sad. Valve has all the time they want/need to optimize, yet they don't, but DICE was working within a time limit, and yet implemented so much with way more effort. Edit: All games have turned to DLSS and FSR for help, when they don't actually help overall perfomance because you're only optimizing resolution perfomance for the GPU, while the CPU will keep chugging like crazy. Every single game nowadays is not GPU limited, but CPU limited with GPU perfomance features that don't help if you're already CPU limited because the game is not optimized via distant LODs and occlusion culling and way more...
Yeha SSR is well known for being hated, both by player and by developers, but they do surve their purpose, and with a proper reflection implementation, it should smoothly transition to a fallback like reflection probes to limit artifacts. Also, it's one of the simplest realtime reflection we have that isn't raytracing.
I think SSR is great. The problem comes from expectations. If you look at a mirror-like surface, such as water, you can see a lot of detail in reflections, and you expect them to be accurate. If you look at brushed metal, you won't see much detail at all. You expect the detail, and when it's incorrect or disappears - that's frustrating. Why I love SSR - it gives at least portion of the screen much more realist. Reflections are a part of real-world, we do a lot of hacks and hand-wavy approximations, SSR is much more faithful to reality. Things like Lumen in Unreal use SSR, and fill areas that don't have sufficient information with other techniques, such as ray tracing and environment maps. "Real-time graphics" has a limitation, it's in the name "real-time". We have to make compromises. In film they spend ~1h per frame on very powerful hardware, for games that's not an options, so we can't use the same techniques. We have to cheat. And the goal is to produce a pretty picture first, and realism second. So SSR is a good compromise in that direction, it makes the picture more pretty by adding extra color variation in the frame.
@@AlexGoldring I'm not talking about brushed metal, I'm talking about a body of water for example. You look at it you, see the cliff above it reflected, you pan the camera down a bit more, the cliff vanishes as it's too far off screen, feeling very weird. (And no, it's not from the different angle causing no reflections, the sky behind the cliff is no reflected , is all). Idk, I get there are limitations, but I feel like if you frequently see such jarring reflection changes when moving around big bodies of water, and there are a good number of such bodies, the jarringness of the change can be worse than having no reflection to begin with.
Thanks, glad you liked it! Haha, yeah I just hate the tick of it, so plugged it out. Now it's just a not very appealing wall decor, but it’s always high noon here 🤠
I feel like Frustum Culling animation was wrong (or oversimplified) even though the spoken text was correct. As I understand, culling is not just about what gets rendered, but also about what gets sent to GPU. If the game uses meshes (most games) then intersecting mesh will still get sent to GPU and processed. If the game uses meshlets (Alan Wake 2 and UE5 games with Nanite), meshlets that aren't inside the frustum and doesn't intersect, get culled.
@@michawhite7613 Not necessarily. Memory bandwidth is not everything, especially in PC games. You are correct in the fact that forward can be better, but it’s usually in the case of multiple other optimization techniques supporting it, like clusters and zprepass. Deferred would not be an industry standard if it was worse.
@@oskar_schramm Making a game around forward shading will give you better FPS than deferred shading. Only time deferred is better is if you're using more complex (and less optimised) lighting. The industry standard for mobile games is forward
@@DevLancelot Unless something drastically changed, last time I checked Deferred Rendering was much faster at rendering scenes with multiple lights. With the drawback of rendering geometry and light as he explained, and the transparency limitation. Maybe in newer hardware this is no longer perceived, but 15 years ago, if you had multiple lights, you had to go the deferred way.
Oh yeah 100% unreal uses many of these. I can’t answer with actual facts which they use, but what I can say is that there isn’t a 3D game engine that doesn’t do frustum culling. Deferred shading, instancing and or batching, shadow atlasing and LODs are some other things they very likely use by default.
7:59 - "But it cannot do soft shadows". What a horrible lie. The Dark Mod did it, you can allow smooth shadows both with stencil shadows mode or with shadow mapping.
You’re right that The Dark Mod achieved soft shadows, but it’s important to clarify: stencil volume shadows alone create hard edges due to their binary nature. The Dark Mod used extra techniques like penumbra widening and post processing blurs to mimic soft shadows. So, while stencil volumes can be enhanced, they don’t natively support soft shadows without these added tricks. Thanks for mentioning though, will try to be more specific in the future.
You mean reflections as a whole or specifically SSR? Because proper reflections outcome shouldn’t have you see the difference. Obviously you will see a lot of artifacts either way with screen space solutions, but games can always do ssr with ex reflections probe as fallback to handle the artifacts. SSR is a pretty ugly solution to the problem all together, but it works in some cases and it’s faster than other solutions, so that’s why we use it.
5:11 this is Lumen isn't it? Except Lumen updates so slowly it looks like fairy lights and fireflies rather than torches, pointers, or fluorescent light flicker
I think you are correct that It's using voxel based GI, but it's not only VBGI. I think its a group of techniques, woven together, and getting a fancy name like lumen. If I recall correctly lumen has these components: Software Raytracing, SSR, SDF, and the fallback is some kind of voxel GI for acceleration, but I could be wrong
I can tell from the techniques I myself understand that you are not trying to explain anything and frankly I doubt you understand most of the things you're listing in this video. If so, what is the point? You're listing technical terms with a vague and sometimes even incorrect description, adding no value. The very first item is already a perfect example: Frustum culling does not cut objects in half! The visualization you show at 0:39 is simply wrong. Frustum culling would remove other rectangles NOT PICTURED. It wouldn't do anything to the rectangles shown. They are all at least partially inside the frustum so they would be rendered in full. You even explicitly say "WE CHECK AGAINST THE BOUNDING BOX OF EACH OBJECT", contradicting what's on screen. If you only check the bounding box this necessarily means that any overlap will lead to full rendering of the object. You're culling entire objects, not vertices. Text correct, video incorrect. Just one example. It would be exhausting to go through the whole video like this.
Thanks for mentioning this, and I’m sad to hear that you think it doesn’t add any value. I agree with you, the visuals were not always representing the tech correctly for various reasons.
So yes you're right about the frustum culling bit, but I tend to think of this video more as a glossary than a tutorial. It's a great checklist for any project that is experiencing graphical bottlenecks. You have to look into each technique anyway, so there's not a lot of harm done here.
Yes, discarding when reading from an alpha texture with alpha 0 is a great optimization. Usually, this is for ex on foliage, so a great thing one could do there aswell is ’checkerboard discarding’ based on distance from camera. Just be sure not to mess up the fragment shaders parallelism from this, as discarding is prone to branching
And this is why you will never make a gme if you decide to optimize befrore you prove your prototype is good. Also why graphics programming is its own specialisatiom.
0:50 this is a bad explanation of frustum culling. In the video it shows that object part which are not in frustum get cut off, while in reality almost none of the engines work that way. And the video title is missleading. Stands like its one technic which nobody knows about when in the video there are at least 20 of them and every mid dev knows about them
The frustum part I totaly agree with. I don't think the title is misleading, Optimization(s) plural. And not all videos are catered towards mid level, or entry level or senior. I try to do what I think is missing in the space.
0:30 Frustum Culling
0:52 Occlusion Culling
1:28 Distance Based Fog
1:42 Instancing
2:06 Batching
2:19 Dynamic Terrain Tessellation
3:03 Image Based Lighting
3:25 Light Probes
4:20 Light Mapping
4:37 Photon Mapping
4:57 Voxel Based Global Illumination
5:14 SSAO
5:27 Deferred Shading
5:49 Light Prepass
6:12 Acceleration Structures
6:33 Tiled Rendering
6:55 Clusters (Forward+)
7:18 Screen Space Reflection
7:29 Precomputed Radiance Transfer
7:50 Stencil Shadow Volumes
8:03 Shadow Atlas
8:22 Cascaded Shadow Maps
8:42 Variance Shadow Mapping
8:04 Mipmapping
9:15 Texture Channel Packing
9:46 Bindless Resources
10:08 Mega Textures
10:28 Resource Streaming
10:43 Sparse Virtual Textures
10:57 Optimizing Models
11:15 LOD
11:55 Caching
12:11 Minimizing State Changes
12:27 Branchless Shaders
12:54 Signed Distance Fields
13:05 Compute Shaders
13:32 Async Compute
14:02 Temporal Reprojection
14:27 FXAA
14:41 Hierarchical Z-Buffer
14:54 Depth Peeling
15:06 Bitwise transparency & Alpha Stripping
15:37 Logarithmic & Reverse Depth
16:01 Depth Prepass
Much appreciated! Well deserved sleep.
@@oskar_schramm Consider copypasting this list into the video's description so that RUclips would automatically form them into chapters; it's a handy feature lol
Its like an atlas of all the graphics optimizations.
@@635574 Exactly. I was about to say, thats a FANTASTIC overview of potential optimizations
Clicked on it out of curiosity expecting some rudimentary and well-trodden stuff, like all the videos saying "why 'game name x' is slow and how we can fix it" and it goes over highly basic stuff like culling, but you did go over a ton of various techniques. It's refreshing to see discussion on some nuances once in a while.
Like when I shipped a game on consoles that targeted strict 60 FPS and occlusion culling actually worsened the overall performance, so we didn't include it : )
Maybe one slight comment: branches aren't really a problem nowadays. Sort of. By themselves, branch code execution came a long way on GPU and can be scheduled much more elegantly by the GPU scheduler, unlocking many more optimization opportunities and techniques along the way.
The main problem with branches is still data dependency, as usually the algorithms that the user implements with branch usages are very dependent on data that cannot be fetched quickly. This in turn can further worsen performance with data which loading depends on branch outcomes if you're not careful enough. There's additional necessity to suspend threads and hide their latency because of that, and the GPUs can only do it so much before they hit massive pipeline stalls.
came in expecting to hear mostly things I already knew, but this was quite extensive! nice video
Half of them I didn',t know or understand: very concise but very usefull! Great.
Lovely video! Though heads up, for RUclips's "chapters" to work properly, it seems you need to have 0:00 marked in the description as well - I'd maybe label it "Intro" and put it right before your 0:30 timestamp.
Wow thanks! I was trying to figure this out for so long, you just made my day :)
I think your clock needs new batteries.
The man has stopped time to record the video 😮
I just recorded 1 minute everyday for 17 days at exactly 12.00 😎
Im so glad I found you, this channel is a gold mine for fellow aspiring devs.
Modern optimizations in AAA games: check the dlss and nanite boxes in unreal engine 5
AAA games already use most of these techniques.
@@oskar_schrammdlss is cool and all but nanite is actually useless, harmful even
Nanite alone uses a bunch of these automatically.
Nanite can give you worse frame rate, because it overdraws if you don’t know how to use it and if your model topology is complex it’s pretty hard to optimize and it’ll kill frame rate, so for big game it’s framekilling, it’s all of the other optimization that are tanking for nanite
I think this explains it pretty well. Although we have to realize that nanite isn’t theoretically a harmful thing, else it would not have been made.
It solves a lot of problem, but just as with any other type advanced tech, it’s important to use it correctly, and if not, yes it can harm performance.
Same is true with ex z-prepass or occlusion culling, which is implemented by 90% of the industry, but if used incorrectly, can also hurt performance.
Acceleration structures almost always have an overhead before it gives results.
It’s not just a checkbox for frames.
Great video!
I think we can mention shader programs optimizations too:
- cycles
- branching
- per vertex/per fragment calculation trade off
- texture samplers and bindless textures
Also hardware specific optimizations like for mobiles(tile architecture) we have
- hidden surface removal
- optimized MAD, dot, saturate shader instructions
- cheap MSAA on write only frame buffers
Glad you liked it. Absolutely great additions, thanks!
It's just too hard to list them all in one video :D
I was thinking of making a video in the future about specifically shader optimization, so will take this in mind then!
As someone learning graphics programming this is a treasure trove of information, thanks for the video
I’m really happy you are finding value in it! Thanks for engaging, and hope you have a blast in the graphics realm!
If only he could explain it better.
@@No1001-w8m Do you have an example? If you mean the depht of each technique, I can make follow up videos for that, but if i were to do 50 techniques in depth in 1 video, that would be 1hour+
Wow, you deserve more subscribers. This video is very high quality and educational, no bs. Hats off to you sir
Thanks for the kind words, and glad you enjoyed it!
well, there is some bs in there, "stay till the end to find out this one simple trick youtubers dont want you to know"-sort of bs.
Finally voxels getting some love ❤
Liked, Subscribed, saved the video :D Great compilation of these techniques.
Nice Video, I thought that I would know pretty much about game development, but most of the things included in your video, I've never even heard.
invaluable video for those learning graphics programming
Note, sparse virtual textures and megatextures are the same thing.
also, shader branching is complicated. Branches are actually very cheap, what's expensive is divergence across a wave. That is - if neighbouring pixels all take the same branch - it's cheap, if they take different branches - it's expensive.
Thanks for letting me know, yeah realized the mega texture part a week or 2 after the video. About branching, you mentioned pixels but I guess the same is true for any code that uses the cuda cores? So if a thread in a warp takes a different path than the others, they cannot run in lockstep right? And fragment shaders are run in a 2x2 thread block, is that 4 threads, or a warp sharing that 2x2 block, if you know?
@@oskar_schramm The concept of a warp or whatever roughly exists because of hardware mapping. In hardware a bunch of cores share a large piece of cache, so no matter if it's cuda, pure compute shaders or fragment shaders - they share a larger than 2x2 (4) group. Actual sizes will vary depending on architecture, even between generations of the same architecture things change sometimes.
But yeah, you got it exactly right - if there is divergence both paths will be executed more or less, so the timing will be worse than max(x,y), it will be x+y roughly.
You typically want as little divergence as possible anyway, because divergence slows everything down, not just branches, your texture caches get hammered as well.
A good recent example would be ray tracing, pretty much every RTX api today will sort rays before dispatch to reduce divergence, because every with the cost of sorting you tend to get ~20% performance boost due to better data locality. If you're interested, there's a lot of literature out there on the topic, "Ray Tracing Gems" has some introduction on it. The actual sorting is done based on direction and origin, rays that originate close together are more likely to hit the same data, and rays that go in the same direction share the same property.
Regarding performance, I would recommend looking into the following topics as well:
1. Temporal and Spatial integration (TAA is an example of that, another would be straight-up upscaling)
2. Meshlet shading. You brought up nanite, which used meshlets (they call them clusters)
3. Variable-rate shading
4. Texture compression. This is something that most platforms will handle for you, but if we're talking about graphics engineering - it's a very valuable technique. Hardware supports texture compression out of the box, and it's actually faster than non-compressed access, as there's less data per pixel.
5. Texture streaming. The idea is to only load MIPs that are currently needed, and skip the full resolution levels, this makes load times much faster and helps eliminate a lot of FPS spikes when you need to load some new textures as we rarely actually need the full 4k or 8k textures since we're looking at things from far away
6. Frame graphs. These are standard today as well, there's a good talk from EA I think from over a decade ago. Both Unity and Unreal use frame graphs, and so do most modern engines. The reason is resource utilization and pipeline flexibility. You can use waaay less memory by reusing render targets between passes and your code become more modular. There's an extra perf benefit as you're more likely to have the render target in your caches already when starting a pass.
There's a lot more, I suggest checking out "Advances in Real-time Graphics" section of SIGGRAPH, they have a separate course every year with industry presentations from companies like Epic and EA, lots of truly amazing stuff and it's all in open access.
These are lovely technical optimization techniques, but we also have to take texture usage into account, like using trim sheets, and model stuff in a modular way to be repetitive but also full of variation and customizability, to allow instancing, and all culling types.
Absolutely, great addition. There is just too much to cover in 1 video, will probably make a follow up
Thank you. Perfect and short explanation.
Hey, glad you appreciate the fast paced format
It's interesting how a game like Battlefield 1 from DICE, can be so pretty and yet so optimized that it can run on very simple hardware, without stutters, graphical glitches and no perfomance issues as in frame drops or frametime issues.
Then you go and look at CS2, which is yet to be optimized, and where as Battlefield 1 can easily run with 62 players in a match, CS2 struggles with 10 of them in a Match.
Also not to say that CS2's perfomance has only been worse after subsequent updates. So sad.
Valve has all the time they want/need to optimize, yet they don't, but DICE was working within a time limit, and yet implemented so much with way more effort.
Edit: All games have turned to DLSS and FSR for help, when they don't actually help overall perfomance because you're only optimizing resolution perfomance for the GPU, while the CPU will keep chugging like crazy. Every single game nowadays is not GPU limited, but CPU limited with GPU perfomance features that don't help if you're already CPU limited because the game is not optimized via distant LODs and occlusion culling and way more...
thank you! This gives me a lot to explore
Glad you enjoyed it. And finding things to explore more was exactly what I was going for :)
Guh, I hate screen space reflections. Always so weird when you look at water and stuff in the reflection just disappears at it goes off the screen
Yeha SSR is well known for being hated, both by player and by developers, but they do surve their purpose, and with a proper reflection implementation, it should smoothly transition to a fallback like reflection probes to limit artifacts. Also, it's one of the simplest realtime reflection we have that isn't raytracing.
I think SSR is great. The problem comes from expectations. If you look at a mirror-like surface, such as water, you can see a lot of detail in reflections, and you expect them to be accurate. If you look at brushed metal, you won't see much detail at all. You expect the detail, and when it's incorrect or disappears - that's frustrating.
Why I love SSR - it gives at least portion of the screen much more realist. Reflections are a part of real-world, we do a lot of hacks and hand-wavy approximations, SSR is much more faithful to reality. Things like Lumen in Unreal use SSR, and fill areas that don't have sufficient information with other techniques, such as ray tracing and environment maps.
"Real-time graphics" has a limitation, it's in the name "real-time". We have to make compromises. In film they spend ~1h per frame on very powerful hardware, for games that's not an options, so we can't use the same techniques. We have to cheat. And the goal is to produce a pretty picture first, and realism second. So SSR is a good compromise in that direction, it makes the picture more pretty by adding extra color variation in the frame.
@@AlexGoldring
I'm not talking about brushed metal, I'm talking about a body of water for example.
You look at it you, see the cliff above it reflected, you pan the camera down a bit more, the cliff vanishes as it's too far off screen, feeling very weird. (And no, it's not from the different angle causing no reflections, the sky behind the cliff is no reflected , is all).
Idk, I get there are limitations, but I feel like if you frequently see such jarring reflection changes when moving around big bodies of water, and there are a good number of such bodies, the jarringness of the change can be worse than having no reflection to begin with.
This is a brilliant video, every useful. But that clock in the background not moving distracted me the entire time.
Thanks, glad you liked it! Haha, yeah I just hate the tick of it, so plugged it out. Now it's just a not very appealing wall decor, but it’s always high noon here 🤠
@@oskar_schrammbased mcgree ref
he managed to squeeze 17mins into less than 1 minute. true optimisation
I feel like Frustum Culling animation was wrong (or oversimplified) even though the spoken text was correct. As I understand, culling is not just about what gets rendered, but also about what gets sent to GPU. If the game uses meshes (most games) then intersecting mesh will still get sent to GPU and processed. If the game uses meshlets (Alan Wake 2 and UE5 games with Nanite), meshlets that aren't inside the frustum and doesn't intersect, get culled.
Yes you are right, the animation was not representative.
Deferred shading is usually slower than forward shading. You need large g-buffers in order to do it, and transferring all that memory is very slow.
@@michawhite7613 Not necessarily.
Memory bandwidth is not everything, especially in PC games.
You are correct in the fact that forward can be better, but it’s usually in the case of multiple other optimization techniques supporting it, like clusters and zprepass.
Deferred would not be an industry standard if it was worse.
@@oskar_schramm Making a game around forward shading will give you better FPS than deferred shading. Only time deferred is better is if you're using more complex (and less optimised) lighting. The industry standard for mobile games is forward
@@DevLancelot Unless something drastically changed, last time I checked Deferred Rendering was much faster at rendering scenes with multiple lights. With the drawback of rendering geometry and light as he explained, and the transparency limitation. Maybe in newer hardware this is no longer perceived, but 15 years ago, if you had multiple lights, you had to go the deferred way.
Very nice video. I just know half of these!
just impressive, thank you very much
Thanks! Glad you liked it.
Great video!
I wonder how many (if any) are being done automagically already in Unreal engine when you first start it up with a template?
Oh yeah 100% unreal uses many of these.
I can’t answer with actual facts which they use, but what I can say is that there isn’t a 3D game engine that doesn’t do frustum culling.
Deferred shading, instancing and or batching, shadow atlasing and LODs are some other things they very likely use by default.
7:59 - "But it cannot do soft shadows". What a horrible lie. The Dark Mod did it, you can allow smooth shadows both with stencil shadows mode or with shadow mapping.
You’re right that The Dark Mod achieved soft shadows, but it’s important to clarify: stencil volume shadows alone create hard edges due to their binary nature. The Dark Mod used extra techniques like penumbra widening and post processing blurs to mimic soft shadows. So, while stencil volumes can be enhanced, they don’t natively support soft shadows without these added tricks. Thanks for mentioning though, will try to be more specific in the future.
wish I could tell A.I. to generate a github video game demo showing each of these optimizations with examples
wdym with GitHub video game?
@@glitchdev github project containing video game files
@@ytubeanon ah I see
Yeah I miss not having demos/more practical examples in my videos, will see what I can do in the future.
Screen space reflections are extremely distracting imho.
You mean reflections as a whole or specifically SSR? Because proper reflections outcome shouldn’t have you see the difference.
Obviously you will see a lot of artifacts either way with screen space solutions, but games can always do ssr with ex reflections probe as fallback to handle the artifacts.
SSR is a pretty ugly solution to the problem all together, but it works in some cases and it’s faster than other solutions, so that’s why we use it.
Agreed. I rather just have cube maps because the abnormal cut off with SSR is too ugly.
well done
Thanks :)
5:11 this is Lumen isn't it? Except Lumen updates so slowly it looks like fairy lights and fireflies rather than torches, pointers, or fluorescent light flicker
I think you are correct that It's using voxel based GI, but it's not only VBGI.
I think its a group of techniques, woven together, and getting a fancy name like lumen. If I recall correctly lumen has these components:
Software Raytracing, SSR, SDF, and the fallback is some kind of voxel GI for acceleration, but I could be wrong
so helpful
Glad it is serving its purpose :)
so why exactly do i NEED TO KNOW this?
I can tell from the techniques I myself understand that you are not trying to explain anything and frankly I doubt you understand most of the things you're listing in this video. If so, what is the point? You're listing technical terms with a vague and sometimes even incorrect description, adding no value. The very first item is already a perfect example: Frustum culling does not cut objects in half! The visualization you show at 0:39 is simply wrong. Frustum culling would remove other rectangles NOT PICTURED. It wouldn't do anything to the rectangles shown. They are all at least partially inside the frustum so they would be rendered in full. You even explicitly say "WE CHECK AGAINST THE BOUNDING BOX OF EACH OBJECT", contradicting what's on screen. If you only check the bounding box this necessarily means that any overlap will lead to full rendering of the object. You're culling entire objects, not vertices. Text correct, video incorrect. Just one example. It would be exhausting to go through the whole video like this.
Thanks for mentioning this, and I’m sad to hear that you think it doesn’t add any value. I agree with you, the visuals were not always representing the tech correctly for various reasons.
So yes you're right about the frustum culling bit, but I tend to think of this video more as a glossary than a tutorial. It's a great checklist for any project that is experiencing graphical bottlenecks. You have to look into each technique anyway, so there's not a lot of harm done here.
Shave off transparent bits of meshes on alpha clipped ones.
Yes, discarding when reading from an alpha texture with alpha 0 is a great optimization. Usually, this is for ex on foliage, so a great thing one could do there aswell is ’checkerboard discarding’ based on distance from camera. Just be sure not to mess up the fragment shaders parallelism from this, as discarding is prone to branching
you should rename the video from this to "list of optimizations used in 3d engines" :|
Bro presenting standard and very basic optimisation technics like they are genius -_- i hate 2024
And this is why you will never make a gme if you decide to optimize befrore you prove your prototype is good. Also why graphics programming is its own specialisatiom.
0:50 this is a bad explanation of frustum culling. In the video it shows that object part which are not in frustum get cut off, while in reality almost none of the engines work that way. And the video title is missleading. Stands like its one technic which nobody knows about when in the video there are at least 20 of them and every mid dev knows about them
The frustum part I totaly agree with. I don't think the title is misleading, Optimization(s) plural. And not all videos are catered towards mid level, or entry level or senior. I try to do what I think is missing in the space.
Well-researched video and well presented and so less likes?
Thanks for the kind words!
As long as somone learns from it and or finds it interesting I'm happy!
First!!! Lol