Genius Graphics Optimizations You NEED TO KNOW

Oskar Schramm

Просмотров 24 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 20 ноя 2024

Комментарии • 102

@perli216 Месяц назад ⁺⁸³
0:30 Frustum Culling
0:52 Occlusion Culling
1:28 Distance Based Fog
1:42 Instancing
2:06 Batching
2:19 Dynamic Terrain Tessellation
3:03 Image Based Lighting
3:25 Light Probes
4:20 Light Mapping
4:37 Photon Mapping
4:57 Voxel Based Global Illumination
5:14 SSAO
5:27 Deferred Shading
5:49 Light Prepass
6:12 Acceleration Structures
6:33 Tiled Rendering
6:55 Clusters (Forward+)
7:18 Screen Space Reflection
7:29 Precomputed Radiance Transfer
7:50 Stencil Shadow Volumes
8:03 Shadow Atlas
8:22 Cascaded Shadow Maps
8:42 Variance Shadow Mapping
8:04 Mipmapping
9:15 Texture Channel Packing
9:46 Bindless Resources
10:08 Mega Textures
10:28 Resource Streaming
10:43 Sparse Virtual Textures
10:57 Optimizing Models
11:15 LOD
11:55 Caching
12:11 Minimizing State Changes
12:27 Branchless Shaders
12:54 Signed Distance Fields
13:05 Compute Shaders
13:32 Async Compute
14:02 Temporal Reprojection
14:27 FXAA
14:41 Hierarchical Z-Buffer
14:54 Depth Peeling
15:06 Bitwise transparency & Alpha Stripping
15:37 Logarithmic & Reverse Depth
16:01 Depth Prepass
@oskar_schramm Месяц назад ⁺³
Much appreciated! Well deserved sleep.
@Architector_4 Месяц назад ⁺⁴
@@oskar_schramm Consider copypasting this list into the video's description so that RUclips would automatically form them into chapters; it's a handy feature lol
@635574 26 дней назад ⁺¹
Its like an atlas of all the graphics optimizations.
@marcsh_dev 19 дней назад
@@635574 Exactly. I was about to say, thats a FANTASTIC overview of potential optimizations
@Kolyasisan 20 дней назад ⁺⁷
Clicked on it out of curiosity expecting some rudimentary and well-trodden stuff, like all the videos saying "why 'game name x' is slow and how we can fix it" and it goes over highly basic stuff like culling, but you did go over a ton of various techniques. It's refreshing to see discussion on some nuances once in a while.
Like when I shipped a game on consoles that targeted strict 60 FPS and occlusion culling actually worsened the overall performance, so we didn't include it : )
@Kolyasisan 20 дней назад ⁺¹
Maybe one slight comment: branches aren't really a problem nowadays. Sort of. By themselves, branch code execution came a long way on GPU and can be scheduled much more elegantly by the GPU scheduler, unlocking many more optimization opportunities and techniques along the way.
The main problem with branches is still data dependency, as usually the algorithms that the user implements with branch usages are very dependent on data that cannot be fetched quickly. This in turn can further worsen performance with data which loading depends on branch outcomes if you're not careful enough. There's additional necessity to suspend threads and hide their latency because of that, and the GPUs can only do it so much before they hit massive pipeline stalls.
@HiHi-iu8gf 18 дней назад ⁺²
came in expecting to hear mostly things I already knew, but this was quite extensive! nice video
@dragoons_net 13 дней назад ⁺¹
Half of them I didn',t know or understand: very concise but very usefull! Great.
@LandyRShambles 8 дней назад ⁺¹
Lovely video! Though heads up, for RUclips's "chapters" to work properly, it seems you need to have 0:00 marked in the description as well - I'd maybe label it "Intro" and put it right before your 0:30 timestamp.
@oskar_schramm 7 дней назад ⁺¹
Wow thanks! I was trying to figure this out for so long, you just made my day :)
@Bluehawk2008 22 дня назад ⁺³
I think your clock needs new batteries.
@DriftJunkie 12 дней назад
The man has stopped time to record the video 😮
@oskar_schramm 9 дней назад
I just recorded 1 minute everyday for 17 days at exactly 12.00 😎
@erneth3303 7 дней назад
Im so glad I found you, this channel is a gold mine for fellow aspiring devs.
@rulonoboev1783 Месяц назад ⁺⁹⁷
Modern optimizations in AAA games: check the dlss and nanite boxes in unreal engine 5
@namelessalias0007 Месяц назад ⁺⁷
AAA games already use most of these techniques.
@nbshftr Месяц назад
@@oskar_schrammdlss is cool and all but nanite is actually useless, harmful even
@OverJumpRally Месяц назад ⁺²
Nanite alone uses a bunch of these automatically.
@wouf_ Месяц назад ⁺¹⁵
Nanite can give you worse frame rate, because it overdraws if you don’t know how to use it and if your model topology is complex it’s pretty hard to optimize and it’ll kill frame rate, so for big game it’s framekilling, it’s all of the other optimization that are tanking for nanite
@oskar_schramm Месяц назад ⁺⁹
I think this explains it pretty well. Although we have to realize that nanite isn’t theoretically a harmful thing, else it would not have been made.
It solves a lot of problem, but just as with any other type advanced tech, it’s important to use it correctly, and if not, yes it can harm performance.
Same is true with ex z-prepass or occlusion culling, which is implemented by 90% of the industry, but if used incorrectly, can also hurt performance.
Acceleration structures almost always have an overhead before it gives results.
It’s not just a checkbox for frames.
@xabblll Месяц назад ⁺¹⁴
Great video!
I think we can mention shader programs optimizations too:
- cycles
- branching
- per vertex/per fragment calculation trade off
- texture samplers and bindless textures
Also hardware specific optimizations like for mobiles(tile architecture) we have
- hidden surface removal
- optimized MAD, dot, saturate shader instructions
- cheap MSAA on write only frame buffers
@oskar_schramm Месяц назад ⁺²
Glad you liked it. Absolutely great additions, thanks!
It's just too hard to list them all in one video :D
I was thinking of making a video in the future about specifically shader optimization, so will take this in mind then!
@RamunDev 2 месяца назад ⁺³²
As someone learning graphics programming this is a treasure trove of information, thanks for the video
@oskar_schramm 2 месяца назад ⁺³
I’m really happy you are finding value in it! Thanks for engaging, and hope you have a blast in the graphics realm!
@No1001-w8m Месяц назад
If only he could explain it better.
@oskar_schramm Месяц назад ⁺¹
@@No1001-w8m Do you have an example? If you mean the depht of each technique, I can make follow up videos for that, but if i were to do 50 techniques in depth in 1 video, that would be 1hour+
@helloitshecker Месяц назад ⁺⁷
Wow, you deserve more subscribers. This video is very high quality and educational, no bs. Hats off to you sir
@oskar_schramm Месяц назад
Thanks for the kind words, and glad you enjoyed it!
@mostrealtutu Месяц назад
well, there is some bs in there, "stay till the end to find out this one simple trick youtubers dont want you to know"-sort of bs.
@dokgo7822 20 дней назад ⁺¹
Finally voxels getting some love ❤
@mihalydozsa2254 Месяц назад ⁺¹
Liked, Subscribed, saved the video :D Great compilation of these techniques.
@glitchdev Месяц назад ⁺²
Nice Video, I thought that I would know pretty much about game development, but most of the things included in your video, I've never even heard.
@bananaboy482 Месяц назад ⁺¹
invaluable video for those learning graphics programming
@AlexGoldring 26 дней назад ⁺²
Note, sparse virtual textures and megatextures are the same thing.
also, shader branching is complicated. Branches are actually very cheap, what's expensive is divergence across a wave. That is - if neighbouring pixels all take the same branch - it's cheap, if they take different branches - it's expensive.
@oskar_schramm 26 дней назад
Thanks for letting me know, yeah realized the mega texture part a week or 2 after the video. About branching, you mentioned pixels but I guess the same is true for any code that uses the cuda cores? So if a thread in a warp takes a different path than the others, they cannot run in lockstep right? And fragment shaders are run in a 2x2 thread block, is that 4 threads, or a warp sharing that 2x2 block, if you know?
@AlexGoldring 26 дней назад ⁺¹
@@oskar_schramm The concept of a warp or whatever roughly exists because of hardware mapping. In hardware a bunch of cores share a large piece of cache, so no matter if it's cuda, pure compute shaders or fragment shaders - they share a larger than 2x2 (4) group. Actual sizes will vary depending on architecture, even between generations of the same architecture things change sometimes.
But yeah, you got it exactly right - if there is divergence both paths will be executed more or less, so the timing will be worse than max(x,y), it will be x+y roughly.
You typically want as little divergence as possible anyway, because divergence slows everything down, not just branches, your texture caches get hammered as well.
A good recent example would be ray tracing, pretty much every RTX api today will sort rays before dispatch to reduce divergence, because every with the cost of sorting you tend to get ~20% performance boost due to better data locality. If you're interested, there's a lot of literature out there on the topic, "Ray Tracing Gems" has some introduction on it. The actual sorting is done based on direction and origin, rays that originate close together are more likely to hit the same data, and rays that go in the same direction share the same property.
Regarding performance, I would recommend looking into the following topics as well:
1. Temporal and Spatial integration (TAA is an example of that, another would be straight-up upscaling)
2. Meshlet shading. You brought up nanite, which used meshlets (they call them clusters)
3. Variable-rate shading
4. Texture compression. This is something that most platforms will handle for you, but if we're talking about graphics engineering - it's a very valuable technique. Hardware supports texture compression out of the box, and it's actually faster than non-compressed access, as there's less data per pixel.
5. Texture streaming. The idea is to only load MIPs that are currently needed, and skip the full resolution levels, this makes load times much faster and helps eliminate a lot of FPS spikes when you need to load some new textures as we rarely actually need the full 4k or 8k textures since we're looking at things from far away
6. Frame graphs. These are standard today as well, there's a good talk from EA I think from over a decade ago. Both Unity and Unreal use frame graphs, and so do most modern engines. The reason is resource utilization and pipeline flexibility. You can use waaay less memory by reusing render targets between passes and your code become more modular. There's an extra perf benefit as you're more likely to have the render target in your caches already when starting a pass.
There's a lot more, I suggest checking out "Advances in Real-time Graphics" section of SIGGRAPH, they have a separate course every year with industry presentations from companies like Epic and EA, lots of truly amazing stuff and it's all in open access.
@slayth6332 Месяц назад ⁺¹
These are lovely technical optimization techniques, but we also have to take texture usage into account, like using trim sheets, and model stuff in a modular way to be repetitive but also full of variation and customizability, to allow instancing, and all culling types.
@oskar_schramm 29 дней назад ⁺¹
Absolutely, great addition. There is just too much to cover in 1 video, will probably make a follow up
@WhipsterCZ Месяц назад ⁺¹
Thank you. Perfect and short explanation.
@oskar_schramm Месяц назад
Hey, glad you appreciate the fast paced format
@Johnny31323 Месяц назад ⁺³
It's interesting how a game like Battlefield 1 from DICE, can be so pretty and yet so optimized that it can run on very simple hardware, without stutters, graphical glitches and no perfomance issues as in frame drops or frametime issues.
Then you go and look at CS2, which is yet to be optimized, and where as Battlefield 1 can easily run with 62 players in a match, CS2 struggles with 10 of them in a Match.
Also not to say that CS2's perfomance has only been worse after subsequent updates. So sad.
Valve has all the time they want/need to optimize, yet they don't, but DICE was working within a time limit, and yet implemented so much with way more effort.
Edit: All games have turned to DLSS and FSR for help, when they don't actually help overall perfomance because you're only optimizing resolution perfomance for the GPU, while the CPU will keep chugging like crazy. Every single game nowadays is not GPU limited, but CPU limited with GPU perfomance features that don't help if you're already CPU limited because the game is not optimized via distant LODs and occlusion culling and way more...
@cory99998 Месяц назад ⁺²
thank you! This gives me a lot to explore
@oskar_schramm Месяц назад
Glad you enjoyed it. And finding things to explore more was exactly what I was going for :)
@MaakaSakuranbo Месяц назад ⁺⁴
Guh, I hate screen space reflections. Always so weird when you look at water and stuff in the reflection just disappears at it goes off the screen
@oskar_schramm Месяц назад
Yeha SSR is well known for being hated, both by player and by developers, but they do surve their purpose, and with a proper reflection implementation, it should smoothly transition to a fallback like reflection probes to limit artifacts. Also, it's one of the simplest realtime reflection we have that isn't raytracing.
@AlexGoldring 26 дней назад
I think SSR is great. The problem comes from expectations. If you look at a mirror-like surface, such as water, you can see a lot of detail in reflections, and you expect them to be accurate. If you look at brushed metal, you won't see much detail at all. You expect the detail, and when it's incorrect or disappears - that's frustrating.
Why I love SSR - it gives at least portion of the screen much more realist. Reflections are a part of real-world, we do a lot of hacks and hand-wavy approximations, SSR is much more faithful to reality. Things like Lumen in Unreal use SSR, and fill areas that don't have sufficient information with other techniques, such as ray tracing and environment maps.
"Real-time graphics" has a limitation, it's in the name "real-time". We have to make compromises. In film they spend ~1h per frame on very powerful hardware, for games that's not an options, so we can't use the same techniques. We have to cheat. And the goal is to produce a pretty picture first, and realism second. So SSR is a good compromise in that direction, it makes the picture more pretty by adding extra color variation in the frame.
@MaakaSakuranbo 26 дней назад ⁺¹
@@AlexGoldring
I'm not talking about brushed metal, I'm talking about a body of water for example.
You look at it you, see the cliff above it reflected, you pan the camera down a bit more, the cliff vanishes as it's too far off screen, feeling very weird. (And no, it's not from the different angle causing no reflections, the sky behind the cliff is no reflected , is all).
Idk, I get there are limitations, but I feel like if you frequently see such jarring reflection changes when moving around big bodies of water, and there are a good number of such bodies, the jarringness of the change can be worse than having no reflection to begin with.
@TheBackstreetNets 2 месяца назад ⁺⁹
This is a brilliant video, every useful. But that clock in the background not moving distracted me the entire time.
@oskar_schramm 2 месяца назад ⁺²
Thanks, glad you liked it! Haha, yeah I just hate the tick of it, so plugged it out. Now it's just a not very appealing wall decor, but it’s always high noon here 🤠
@Ahrone1586 Месяц назад
@@oskar_schrammbased mcgree ref
@Betruet Месяц назад ⁺³
he managed to squeeze 17mins into less than 1 minute. true optimisation
@SmellsLikeRacing Месяц назад ⁺²
I feel like Frustum Culling animation was wrong (or oversimplified) even though the spoken text was correct. As I understand, culling is not just about what gets rendered, but also about what gets sent to GPU. If the game uses meshes (most games) then intersecting mesh will still get sent to GPU and processed. If the game uses meshlets (Alan Wake 2 and UE5 games with Nanite), meshlets that aren't inside the frustum and doesn't intersect, get culled.
@oskar_schramm Месяц назад ⁺¹
Yes you are right, the animation was not representative.
@michawhite7613 2 месяца назад ⁺⁴
Deferred shading is usually slower than forward shading. You need large g-buffers in order to do it, and transferring all that memory is very slow.
@oskar_schramm 2 месяца назад ⁺¹
@@michawhite7613 Not necessarily.
Memory bandwidth is not everything, especially in PC games.
You are correct in the fact that forward can be better, but it’s usually in the case of multiple other optimization techniques supporting it, like clusters and zprepass.
Deferred would not be an industry standard if it was worse.
@DevLancelot Месяц назад ⁺¹
@@oskar_schramm Making a game around forward shading will give you better FPS than deferred shading. Only time deferred is better is if you're using more complex (and less optimised) lighting. The industry standard for mobile games is forward
@DiegoSynth 10 дней назад
@@DevLancelot Unless something drastically changed, last time I checked Deferred Rendering was much faster at rendering scenes with multiple lights. With the drawback of rendering geometry and light as he explained, and the transparency limitation. Maybe in newer hardware this is no longer perceived, but 15 years ago, if you had multiple lights, you had to go the deferred way.
@usercontent2112 2 месяца назад ⁺²
Very nice video. I just know half of these!
@floinseler Месяц назад ⁺¹
just impressive, thank you very much
@oskar_schramm Месяц назад
Thanks! Glad you liked it.
@Faby__ Месяц назад ⁺¹
Great video!
@Adi-rb3vr Месяц назад ⁺²
I wonder how many (if any) are being done automagically already in Unreal engine when you first start it up with a template?
@oskar_schramm Месяц назад
Oh yeah 100% unreal uses many of these.
I can’t answer with actual facts which they use, but what I can say is that there isn’t a 3D game engine that doesn’t do frustum culling.
Deferred shading, instancing and or batching, shadow atlasing and LODs are some other things they very likely use by default.
@Capewearer Месяц назад ⁺⁴
7:59 - "But it cannot do soft shadows". What a horrible lie. The Dark Mod did it, you can allow smooth shadows both with stencil shadows mode or with shadow mapping.
@oskar_schramm Месяц назад ⁺²
You’re right that The Dark Mod achieved soft shadows, but it’s important to clarify: stencil volume shadows alone create hard edges due to their binary nature. The Dark Mod used extra techniques like penumbra widening and post processing blurs to mimic soft shadows. So, while stencil volumes can be enhanced, they don’t natively support soft shadows without these added tricks. Thanks for mentioning though, will try to be more specific in the future.
@ytubeanon Месяц назад ⁺⁴
wish I could tell A.I. to generate a github video game demo showing each of these optimizations with examples
@glitchdev Месяц назад
wdym with GitHub video game?
@ytubeanon Месяц назад ⁺²
@@glitchdev github project containing video game files
@glitchdev Месяц назад
@@ytubeanon ah I see
@oskar_schramm Месяц назад ⁺²
Yeah I miss not having demos/more practical examples in my videos, will see what I can do in the future.
@Magnymbus Месяц назад ⁺³
Screen space reflections are extremely distracting imho.
@oskar_schramm Месяц назад
You mean reflections as a whole or specifically SSR? Because proper reflections outcome shouldn’t have you see the difference.
Obviously you will see a lot of artifacts either way with screen space solutions, but games can always do ssr with ex reflections probe as fallback to handle the artifacts.
SSR is a pretty ugly solution to the problem all together, but it works in some cases and it’s faster than other solutions, so that’s why we use it.
@RiasatSalminSami Месяц назад ⁺¹
Agreed. I rather just have cube maps because the abnormal cut off with SSR is too ugly.
@mouloudagaoua Месяц назад ⁺¹
well done
@oskar_schramm Месяц назад
Thanks :)
@ThePlayerOfGames Месяц назад
5:11 this is Lumen isn't it? Except Lumen updates so slowly it looks like fairy lights and fireflies rather than torches, pointers, or fluorescent light flicker
@oskar_schramm 26 дней назад
I think you are correct that It's using voxel based GI, but it's not only VBGI.
I think its a group of techniques, woven together, and getting a fancy name like lumen. If I recall correctly lumen has these components:
Software Raytracing, SSR, SDF, and the fallback is some kind of voxel GI for acceleration, but I could be wrong
@hexiy_dev Месяц назад ⁺¹
so helpful
@oskar_schramm Месяц назад ⁺¹
Glad it is serving its purpose :)
@phutureproof 24 дня назад
so why exactly do i NEED TO KNOW this?
@forasago Месяц назад ⁺⁷
I can tell from the techniques I myself understand that you are not trying to explain anything and frankly I doubt you understand most of the things you're listing in this video. If so, what is the point? You're listing technical terms with a vague and sometimes even incorrect description, adding no value. The very first item is already a perfect example: Frustum culling does not cut objects in half! The visualization you show at 0:39 is simply wrong. Frustum culling would remove other rectangles NOT PICTURED. It wouldn't do anything to the rectangles shown. They are all at least partially inside the frustum so they would be rendered in full. You even explicitly say "WE CHECK AGAINST THE BOUNDING BOX OF EACH OBJECT", contradicting what's on screen. If you only check the bounding box this necessarily means that any overlap will lead to full rendering of the object. You're culling entire objects, not vertices. Text correct, video incorrect. Just one example. It would be exhausting to go through the whole video like this.
@oskar_schramm Месяц назад ⁺¹
Thanks for mentioning this, and I’m sad to hear that you think it doesn’t add any value. I agree with you, the visuals were not always representing the tech correctly for various reasons.
@bodardr Месяц назад ⁺¹
So yes you're right about the frustum culling bit, but I tend to think of this video more as a glossary than a tutorial. It's a great checklist for any project that is experiencing graphical bottlenecks. You have to look into each technique anyway, so there's not a lot of harm done here.
@BusinessWolf1 Месяц назад ⁺²
Shave off transparent bits of meshes on alpha clipped ones.
@oskar_schramm Месяц назад
Yes, discarding when reading from an alpha texture with alpha 0 is a great optimization. Usually, this is for ex on foliage, so a great thing one could do there aswell is ’checkerboard discarding’ based on distance from camera. Just be sure not to mess up the fragment shaders parallelism from this, as discarding is prone to branching
@миииц 25 дней назад
you should rename the video from this to "list of optimizations used in 3d engines" :|
@sharkgamestudio7630 18 дней назад
Bro presenting standard and very basic optimisation technics like they are genius -_- i hate 2024
@635574 26 дней назад
And this is why you will never make a gme if you decide to optimize befrore you prove your prototype is good. Also why graphics programming is its own specialisatiom.
@the-guy-beyond-the-socket Месяц назад
0:50 this is a bad explanation of frustum culling. In the video it shows that object part which are not in frustum get cut off, while in reality almost none of the engines work that way. And the video title is missleading. Stands like its one technic which nobody knows about when in the video there are at least 20 of them and every mid dev knows about them
@oskar_schramm 26 дней назад
The frustum part I totaly agree with. I don't think the title is misleading, Optimization(s) plural. And not all videos are catered towards mid level, or entry level or senior. I try to do what I think is missing in the space.
@hmmmidkkk 2 месяца назад ⁺²
Well-researched video and well presented and so less likes?
@oskar_schramm 2 месяца назад ⁺¹
Thanks for the kind words!
As long as somone learns from it and or finds it interesting I'm happy!
@SkeletalRavenArts2 2 месяца назад ⁺¹
First!!! Lol

Следующие

Автовоспроизведение

10 Gamedev Libraries For 10 Needs (used by AAA companies)