Another "con" is that you can't use MSAA in deferred rendering. Only post-processing AA like: FXAA, TAA, SMAA, CSAA. Because the lighting calculations are only done ONCE for the "top most" visible pixels, not for their "antialiased" bastardized pixels, lol. While in forward rendering you can. So if you're a fan of MSAA (which is a _hardware feature_ since like the 90's or early 2000's), then forward rendering is the way. But keep in mind that MSAA also comes with its own limitations, like it only applies AA to the _geometric_ part of the triangles. It does absolutely nothing for the inside of the triangle, so if a texture is noisy or the specular is noisy (because the normal map contains too much detail and "reflects" light at sudden sharp angles), then you ARE gonna notice the aliasing and it's gonna bother you. And a lot of people seem to hate TAA, for some reason... There's even "r/FuckTAA" - _a subreddit dedicated to discussing the plague of blurry anti-aliasing methods that are ruining the visuals of modern video games._
I'll copy my reply about MSAA from my own comment: MSAA is not better than TAA for several reasons: 1. Performance hit is too much on "modern" art - MSAA works extremely poorly when triangle density is high, in worst case with 4x MSAA you are hit with 16 times more pixel shader invocations if your triangle overlaps a lot of adjacent pixel centers, this can easily quadruple your frame time, especially if you have complex shaders. MSAA compression is also designed for large triangles, it doesn't work if your art direction needs high triangle density. This assumes forward renderer, with deferred you get all these issues on top of crazy bandwidth increase which is just not practical. 2. It does not address specular aliasing, MSAA works by computing a coverage mask for subsamples and stores the result of your pixel shader invocation for each affected subsample. It does not shade each individual subsample, it only interpolates the resulting colors. It's not supersampling, because supersampling would actually invoke the pixel shader for each subsample instead of interpolating it. There is another way to reduce shader aliasing by filtering your roughness and normal maps, this is called "von Mises Fischer" or "vMF" filtering, it doesn't completely remove specular aliasing but helps a lot with it. Some games do VMF with combination of something like non-temporal SMAA and I think it's a pretty good compromise (and doesn't need motion vectors), but you need a rather complex art production pipeline and someone who enforces art guidelines. This is often very expensive to do unless you have a strong tech art discipline in your team. AI TAA is not really that bad, give it a try. XeSS/DLSS have almost zero ghosting and really good image clarity and stability, they only struggle when there's a lot of translucency and particles on the screen. Now if you have a VR game your situation is very different, you cannot use TAA in VR at all because people will start throwing up when playing your game, so your only option is MSAA or SMAA. This severely limits what art direction your game can have and increases art production costs/time. And you pretty much have to implement a vMF pipeline in your project. FXAA is really not an option nowadays, it needs to retire with honor and dignity.
@stysner4580 True, but IMO a "good" TAA implementation is not supposed to look as good as something like SSAA. It should just provide good anti aliasing at a minimal cost to the visuals. The UE5 implementation just has too much smear because they use too many frames. I don't like the fact that doom eternal has forced TAA, but it doesn't look that bad there.
@@bazhenovc754The temporal antialiasing for VR thing applies to non VR as well to a lesser extent. It's just totally wonky and looks and feels wrong. Complicated problem but higher rendering resolution reduces lateral aliasing and higher framerate reduces temporal aliasing.
Permutations reduction is a strong case for a deferred renderer. If you have many complex materials deferred renderer can be a better choice. Overall, I don't have a strong opinion on whether a forward or deferred is "better". They both have their merits and drawbacks.
There are better ways to reduce the amount of shaders that you have, decals being one of them. If your art team does not have to create a unique shader every time they want to have some small scratches on the surface or other minor details then you won't have the shader explosion problem in the first place.
Nice presentation. However, you're not addressing a big issue with forward renderers : shaders combinatory explosion! I worked on Dishonored 2 a few years ago, we were using a modified version of Id Software's idTech 5 engine and at the very end of production (circa 2016) we were compiling about 50,000 shaders (distributed compiling of course) because of all the variations (transparent, cutoff, decals, translucency, all the kinds of BRDFs and so on and so on). It took *AGES* to compile and it was a big source of headaches and delays in production, as well as a nightmare to optimize! With deferred, you only have to handle a couple thousands of variations, it takes at most a few minutes to compile, and it uses a unified lighting pipeline that also makes shaders smaller and easier to maintain...
Dishonored 2 is in the top 5 of my most favourite games of all time, I absolutely loved it! I'd like to hear more about your shader explosion issue if you have the time, can you pinpoint what exactly caused it in the first place? My general thoughs on this is that if you have a good decal system like in idTech5 then you can use decals most of the time when you need various surface variations and your art team doesn't have to create a new shader every time they want a slightly different scratch on the surface and they can just use decals for these small details. Of course you still have gameplay-specific shaders, but there shouldn't be 50k of them right, Doom managed to ship with like 100-ish shaders in total and visually it looks amazing.
tiled rendering with lots of onscreen lights was often touted as a benefit of deferred, but that relies on a false assumption about how light works, very rarely will you have lots of small point lights with a small radius, perhaps street lamps in a city like GTA. but most lights are going to throw far, likely across the entire screen in order to look natural. To keep the benefit of tiled rendering, light radius and falloff are artificially capped to a small radius and a harsh edge, which looks very artificial, if combined with gamma correction it will make the falloff look even harsher and even more unnatural. really very few actual scenes fit the way the tech demos with hundreds of lights were constructed.
@@charlieking7600 I'm not. Tiled rendering is often presented as a benefit of deferred. Most deferred rendering tech demos have hundreds of point lights to show it off.
I've often felt like an industry outcast for being so largely against deferred rendering. I'm not saying it never has a place but I feel like studios now just default to it without understanding the substantial drawbacks.
I looked at Godot's documentation recently, and it seems they use a modified version of Forward+ rendering for performance reasons. It's still not a very popular engine though.
We can't even make a proper sky for VR in UE5 now. It's all "lumen lumen lumen nanite nanite" and such poo. They push everyone to try and make AAA games when UE4 was actually something for small developers and teams as well. We can't all be Ubisoft. I was working all day having a blast with UE4 and now with 5 I feel like I almost quit. Old techniques don't work well anymore and new techniques also don't work well...
Yeah VGPR pressure is generally higher in a forward renderer on average - this basically trades VGPR pressure for bandwidth and bandwidth is generally more valuable in my opinion, especially if you're porting to VR or mobile. I've seen a lot of deferred-to-forward ports where forward consistently outperformed deferred without any actual shader logic changes. It's also worth noting that the added VGPR pressure is coming from the added light processing loop and lighting model evaluation, which is typically not that much overhead. Crazy complex node-based shaders are often a much bigger problem and they also impact deferred shading in a similar fashion. Also in a forward renderer it is easier to manage and scale it down where needed, for instance with a shader LOD system that uses a simplified lighting model for distant objects or even having a simplified lighting for "non-important" objects on a low graphics preset - in a deferred renderer you will need to branch on a g-buffer flag to evaluate those and branching doesn't reduce the allocated VGPR, the compiler will still allocate for the worst case of both paths and it would impact the entire shading pass (assuming that you light everything in CS and not with drawing geometry that samples the g-buffer). At the same time it's also easier to scale it up, you can have a more fancy and expensive lighting model for specific key objects in the game and they won't impact the performance of your entire shading pass.
On a side note, occupancy by itself is not the end goal, the goal is not to stall on memory loads and you have other tools to deal with that. Having high occupancy can even reduce your overall performance if you have multiple work batches invalidating the cache all the time.
I think you forget one important aspect of forward vs deferred: *depth pre-pass.* _Edit: which you mention at __24:57_ 😅 With a forward render, you really do need an initial "pre-pass" only for depth. Meaning you have to render ALL your meshes to the depth buffer (using an empty fragment shader). This incurs an extra cost. While with a _deferred_ renderer you don't really need one. Not really. I mean, you CAN if you wanted to, but it's often not used. If you choose a forward renderer and you skip the depth pre-pass (and you don't sort your meshes), you risk running (potentially) expensive lighting calculations for the fragments that the depth buffer will discard, thereby throwing away whatever work the GPU did. This is the entire reason why deferred shading was invented in the first place! 😄 I'm not necessarily saying that deferred is better. It depends entirely on your scene. Is it a "corridor" type shooter? Is it an open-world scene with lots of things behind other things? How "deep" is your scene, and can you cull those meshes on the CPU (or GPU)? These are the questions you should be asking when trying to choose between forward or deferred. Another important questions is: how many lights do you want? Because deferred easily supports _tens of thousands_ since they're basically like stamps drawn in 2D on the color buffer.
This was discussed in the video, 2 pass forward is generally faster than deferred. The cost of having an extra depth pass in a modern optimized forward renderer is lower than the cost of using that much extra bandwidth in a deferred renderer, especially so if you use the "stamps in 2D on the color buffer" approach - this will waste a metric ton of bandwidth. In order to for a deferred renderer to compete with forward at all you need to use a tiled deferred shading with a compute shader that caches gbuffer samples in TGSM.
Great video, Kirill. I really appreciated your deep dive into the complexities of deferred and forward renderers. But I also have a question: I'm eager to hear your detailed perspective on the use of depth prepass, especially in the context of intermediate-based GPUs. In recent discussions, especially on platforms like Twitter/X, there seems to be a divided opinion among GPU developers regarding its necessity. Some argue that the benefits have become too marginal to justify its use even on current intermediate-based GPUs even on forward rendering, and point out the fact that it is disadvantageous on tile-based GPUs due to the inherent principles of tile-based rendering utilized in mobile GPUs (which is out of the question anyway, since it is a fact). On the other hand, there is a section of the GPU development community that staunchly supports the use of depth prepass on immediate-based GPUs, emphasizing its role in minimizing or even avoiding overdraw during each subsequent geometry rendering pass. They argue that it fosters a more efficient rendering process. Personally, I align with the latter viewpoint. I find that using depth prepass is not only beneficial for reducing or avoiding overdraw but also plays a crucial role in enhancing features such as SSAO, early motion vectors, and so forth, before the actual main geometry pass. In my own engine (as a part of my PasVulkan project), which uses the Forward+ rendering technique coupled with depth prepass, bindless all-in-one-go-per-pass techniques and so forth, I have observed significant benefits in this matter. I'm curious to know where you stand on this matter and would appreciate your detailed opinion.
If you are not running on a tile-based deferred GPU (TBDR) then you kinda need a depth prepass in a forward renderer, you can try turning it off every now and then and see what happens (I've tried it a few times and it always was a big performance hit). Keep in mind that not all tiled GPUs are TBDR - there are also tile-based immediate GPUs (TBIR) and they DO need a depth-prepass. Personally I'm not a fan of doing early motion vectors in a depth prepass, you are losing the benefits of a fast depth-only rendering if you do that. Like I've mentioned in the video, a true depth-only pass without a pixel shader is a fraction of the cost because GPUs have a separate way of handling that and it gets disabled if you add a pixel shader to write motion vectors. To make the most use of the depth prepass you should try rendering all opaque objects first without a pixel shader, then render everything with an alpha test on top of it. Alpha-tested geometry won't get that fast rendering either but there's no way around it, if you need alpha testing you pay for it.
@@bazhenovc754 Okay, I should have explained that better. 🙂After my depth-only buffer depth-prepass without color-writes, a sequence of few draw calls (or just one big multi indirect) with activated color-writes (for motion vectors and normals) directly follows, but still in the same "depth-prepass" renderpass from the view of my frame/render-graph implementation. For my engine, I took a lot of design ideas from Doom Eternal.
@@BenjaminRosseaux Yeah that makes sense, I misunderstood what you initially wrote. You could write motion vectors from the main forward pass too, I'm not sure if it's very useful to have them early.
I'm curious how you feel about visibility buffers, it is fundamentally deferred shading with 32bits per pixel (64 if we're counting depth) and does also address some of the flexibility concerns since you can make different pieces of the screen draw with different shaders, textures, attributes, etc.
The visibility buffer has its own pros and cons and it's probably a good topic for a separate video. Overall I like it, but I think that most games won't really benefit from it unless the art direction wants extremely dense and detailed geometry.
Hello! Decided to visit this presentation again to set up a proper Forward Renderer for our upcoming project. I'm wondering what's the advantage of using RGB32_UINT instead of 3 separate targets each with specific format (R11G11B10_F, R16G16_(F|SNORM), R32_UINT)? How alpha blended is performed into such render target? How bilinearly sample from it? Does it make reading from such target less cache efficient? Seems like I'm missing some crucial knowledge.
You will not get correct results either way, gbuffer normals are not color and if you use alpha blending on that you will get random results and it will look bad. This is the same issue with deferred shading as well - you can't alpha blend normals and material properties, that data is not color. So with that regard, the texture format doesn't matter that much. If you need bilinear sampling you can split some stuff into separate filterable render targets, but try to keep the overall bandwidth under control (i.e. 128 bits in total, use 3 MRT 32bits color filterable, 32bits normal filterable, 64bits packed non-filterable) but keep in mind MRT is going to be a bit more expensive even if the overall bits per pixel is equivalent. Same goes for alpha blending, if you have a ton of translucent geometry it might make sense to keep the color separate with alpha-blendable format. For transparency you essentially have 2 options: 1 - separable blending, color target uses alpha blending, all other targets replace the data. This will make the color appear blended, but AO will be using the translucent front surface as input and AO won't be visible behind it. 2 - only use alpha blending and don't touch gbuffer normals at all. This will make the color appear blender, but AO will be using the opaque surface behind it as input - AO will be visible behind the translucent surface, but won't be applied to the translucent surface itself. Same goes for DOF and other that uses gbuffer inputs in one way or another. Depending on your game, you can use either 1 or 2 or both - the difference will be in how these things interact with post processing (different types of artifacts). Experiment with that and find the right combination that works for your game.
Thanks for the presentation. Am not a graphics programmer but an Unreal Engine noobie learning tech art. I noticed that when I enabled forward renderer + fxaa, the visuals improved greatly plus I got about 50 to 60 fps running a basic level on a Ryzen APU. Am glad to see forward renderer being recommended despite what the unreal docs say.
They've added a forward renderer because their deferred renderer didn't work in VR (too slow), it's pretty good but there are some features missing or not supported.
Shame you didn't touch on topic of having lots of cut out geometry, like rendering forests. You can't just not use pixel shader on those, although you could set color mask to 0 I know that unity's book of the dead demo still had massive benefit from using prepass even with deferred rendering. Still, I also read in "Geometry Pipeline in CallOfDuty" that forward kinda sucks for cut out geometry?
You only need to render prepass with the alpha clip enable, but render it after all opaque. Then the main pass can be rendered fully opaque with earlyZ, the depth buffer will naturally have gaps where texture alpha was 0. The prepass with a pixel shader will be more expensive and I don't think this is a solvable problem. You can optimize geometry, i.e. www.humus.name/index.php?ID=266 to reduce that overhead.
Год назад+1
Awesome presentation. This will gonna help on my hobby game engine.
CPU frustum culling is damn cheap but draw calls aren't cheap so unless your system is 100% running on GPU,there is no reason to have GPU frustum culling.
Indirect draw calls are a prerequisite here. You basically have an array of draw call structures and a counter. A compute shader does the culling test and, if passed, atomically increments the counter and writes the draw call values. Then it's submitted for execution using a SINGLE cpu call (e.g. ExecuteIndirect on DX12)
Thanks for great in-depth material man! One day I might be able to grasp everything you say but not today unfortunately. What do you think about deferred shading on modern mobile devices when having open world 3D environment but with only few local lights? Would your advice still be so radical against deferred? Thanks.
Allowing for custom BRDFs in Forward+ is greatly underrated. No serious graphics programmer should be happy with Disney-GGX'ing everything like most modern deferred games. You also missed talking about bloom and AO, which are everywhere and big enough to mention in the pipeline overviews IMO. True forward -> SSAO is bad as lacks normals, needs Depth+Normals. Deferred = already guaranteed everything for it.
20:27 "Forward shading still produces a g-buffer". No, it doesn't. There's no "g-buffer" _at all_ in forward rendering. 🤨 Are we talking about the same thing?
I guess it talks about the depth prepass, that forward rendering may rely on to prevent overdraw, it may include also any additional data for further effects, but overall, forward shading means it does all the lighting calculation per mesh, not per light, on a diferent pass.
When did people start calling them "deck-ullz"? It's always been "dee-kal" where "kal" rhymes with "pal" or "Cal" in "California". This is the second time this week I've seen a gfx programmer say it "deck-ull" on RUclips. It's even "dee-kal" on dictionary sites (literally) and hundreds of older videos. Why is the pronunciation changing suddenly? Have people just only seen the word any never heard it said before somehow, in this day and age of streaming video? It reminds me of watching game reviewers saying "lin-neer" for the word "linear" because they've only read it and never heard it, and they see the word "near" in it and automatically assume it's a 2-syllable word instead of the 3-syllable word it's always been since before they came along and mis-read it.
22:11 _(regarding SSR)_ "You cannot do this with deferred shading because you need the normal and roughness". I think you got that BACKWARDS, chief. 😆 You can't do that in a *_forward_* renderer, because you don't have the roughness as a separate render target. But you totally CAN with deferred! "Deferred" means putting it off (or "deferring" it) to a later date during the frame rendering timeline, so that you do the (potentially) expensive lighting calculations _ONCE_ per pixel.
You need to work on your listening comprehension, chief 🙂You are taking phrases out of context and constructing an imaginary windmill to attack, the phrase "cannot do this" does not refer to SSR and I explained earlier in the video what a forward renderer is in the context of this presentation.
Alien isolation runs on my toy laptom, doom 3 / quake 4 doesnt run on my toy laptop. Why? I know that alien isolation uses deferred rendering and Doom 3 and Quake 4 do some stupid shit. My laptop is 0.3 TFLOPs and 128VRAM.
i don't like deferred shading because i see it is verbose,although it might be more efficient.i,'m making hard decision to do or don't implement deferred shading because it also change my shader alot.
Can you elaborate which statements? I admit that most of my experience is with consoles and PC, although I did work a bit with various mobile GPUs. There is TBDR and TBIR, you don't need a depth prepass on a TBDR because the HW already deals with overdraw for you with FPK/HSR, so a depth prepass is just not doing anything useful. But you still need it on a TBIR if you have a forward renderer, all my other arguments should still apply though? I know that you can try to keep the G-Buffer in the tile memory and that mitigates a lot of bandwidth issues, but if you need the G-Buffer for postprocessing (SSR or anything that needs arbitrary sampling of the render target) then you're forced to move it off-tile anyway or scale down and make other concessions (same issue with an on-tile depth buffer - if you have anything going on with your postprocessing you can't really keep it on-tile because you'll need it later). Not to mention that on-tile memory is a scarce resource and there might not be enough of it for a fat G-Buffer. So I still don't see any distinct advantages of deferred renderer over the forward renderer even on TBDR - on these types of GPUs forward renderer at worst shouldn't perform worse than a deferred renderer but it comes with a lot of flexibility. And from what I've seen, all TBDR vendors recommend using a forward renderer anyway.
Hi @@bazhenovc754, great talk by the way, I forgot to mention in my last comment xD. As you pointed out, things like mobile GPUs having some sort of HSR which makes the depth prepass suboptimal, but you still need the depth for things like the SSAO mask generation. To be able to stay in tile memory (you should be able to get more than enough for the GBuffer data) mitigates a lot of bandwidth, not entirely as you will still have to end up writing once to let later passes use the GBuffer but at least you will stop doing multiple writes/loads. Also, some of the comments focus a lot in the assumption of Compute for everything, which is true for Desktop and Console territory but overall on mobile maximizing the use of fragment shaders is still the general advice. So, overall I agree a Forward renderer, as long as you have a really limited amount of lights is going to be the most efficient approach. Unfortunately, modern graphics tend not to align well with this approach as you might need access to material information later in the frame (i.e screen probes based GI techniques, reflections ...) which force people to end up having to write some sort of GBuffer anyway.
Funny enough I’m actually using deferred rendering to upgrade an old game called gunz the duel. I’d rather used forward+ but not willing to upgrade the games entire engine from direct3d9 😂
Ahaha, dude never heard about tiled rendering architecture. Everything he says here is only relevant for immediate mode architectures. So Use deferred for tiled architectures.
@@homematvejif you have any kind of complex post processing, you are going to move the gbuffer off tile, with forward rendering this is 2 render targets less than deferred. The post processing is unchanged in both cases. By “complex” I mean effects that need to access adjacent pixels - any kind of antialiasing for example. Sure you can run simple stuff like tone mapping on tile, but then your AA quality will suffer if you run it post tonemap. I can assure you that I’ve heard about tile based architectures and I can also point you to vendor documentation that clearly recommends forward shading - community.arm.com/arm-community-blogs/b/graphics-gaming-and-vr-blog/posts/killing-pixels---a-new-optimization-for-shading-on-arm-mali-gpus You need to learn more about the hardware, TBDR in particular.
The kind of forward renderer discussed in this video would be firmly in the “Forward+” category already. Depth/normal/etc prespasses, occlusion, clustered lighting and decals… all of these are “forward+” things
Another "con" is that you can't use MSAA in deferred rendering. Only post-processing AA like: FXAA, TAA, SMAA, CSAA. Because the lighting calculations are only done ONCE for the "top most" visible pixels, not for their "antialiased" bastardized pixels, lol. While in forward rendering you can. So if you're a fan of MSAA (which is a _hardware feature_ since like the 90's or early 2000's), then forward rendering is the way. But keep in mind that MSAA also comes with its own limitations, like it only applies AA to the _geometric_ part of the triangles. It does absolutely nothing for the inside of the triangle, so if a texture is noisy or the specular is noisy (because the normal map contains too much detail and "reflects" light at sudden sharp angles), then you ARE gonna notice the aliasing and it's gonna bother you. And a lot of people seem to hate TAA, for some reason... There's even "r/FuckTAA" - _a subreddit dedicated to discussing the plague of blurry anti-aliasing methods that are ruining the visuals of modern video games._
I'll copy my reply about MSAA from my own comment:
MSAA is not better than TAA for several reasons:
1. Performance hit is too much on "modern" art - MSAA works extremely poorly when triangle density is high, in worst case with 4x MSAA you are hit with 16 times more pixel shader invocations if your triangle overlaps a lot of adjacent pixel centers, this can easily quadruple your frame time, especially if you have complex shaders. MSAA compression is also designed for large triangles, it doesn't work if your art direction needs high triangle density. This assumes forward renderer, with deferred you get all these issues on top of crazy bandwidth increase which is just not practical.
2. It does not address specular aliasing, MSAA works by computing a coverage mask for subsamples and stores the result of your pixel shader invocation for each affected subsample. It does not shade each individual subsample, it only interpolates the resulting colors. It's not supersampling, because supersampling would actually invoke the pixel shader for each subsample instead of interpolating it.
There is another way to reduce shader aliasing by filtering your roughness and normal maps, this is called "von Mises Fischer" or "vMF" filtering, it doesn't completely remove specular aliasing but helps a lot with it.
Some games do VMF with combination of something like non-temporal SMAA and I think it's a pretty good compromise (and doesn't need motion vectors), but you need a rather complex art production pipeline and someone who enforces art guidelines. This is often very expensive to do unless you have a strong tech art discipline in your team.
AI TAA is not really that bad, give it a try. XeSS/DLSS have almost zero ghosting and really good image clarity and stability, they only struggle when there's a lot of translucency and particles on the screen.
Now if you have a VR game your situation is very different, you cannot use TAA in VR at all because people will start throwing up when playing your game, so your only option is MSAA or SMAA. This severely limits what art direction your game can have and increases art production costs/time. And you pretty much have to implement a vMF pipeline in your project.
FXAA is really not an option nowadays, it needs to retire with honor and dignity.
@stysner4580 I also agree The modern games AA is ass.
@stysner4580 TAA is not that bad when done well. TAA gets a bad rap from the horrible UE5 implementation of TAA.
@stysner4580 True, but IMO a "good" TAA implementation is not supposed to look as good as something like SSAA. It should just provide good anti aliasing at a minimal cost to the visuals. The UE5 implementation just has too much smear because they use too many frames. I don't like the fact that doom eternal has forced TAA, but it doesn't look that bad there.
@@bazhenovc754The temporal antialiasing for VR thing applies to non VR as well to a lesser extent. It's just totally wonky and looks and feels wrong. Complicated problem but higher rendering resolution reduces lateral aliasing and higher framerate reduces temporal aliasing.
Permutations reduction is a strong case for a deferred renderer. If you have many complex materials deferred renderer can be a better choice.
Overall, I don't have a strong opinion on whether a forward or deferred is "better". They both have their merits and drawbacks.
There are better ways to reduce the amount of shaders that you have, decals being one of them. If your art team does not have to create a unique shader every time they want to have some small scratches on the surface or other minor details then you won't have the shader explosion problem in the first place.
Nice presentation. However, you're not addressing a big issue with forward renderers : shaders combinatory explosion!
I worked on Dishonored 2 a few years ago, we were using a modified version of Id Software's idTech 5 engine and at the very end of production (circa 2016) we were compiling about 50,000 shaders (distributed compiling of course) because of all the variations (transparent, cutoff, decals, translucency, all the kinds of BRDFs and so on and so on).
It took *AGES* to compile and it was a big source of headaches and delays in production, as well as a nightmare to optimize!
With deferred, you only have to handle a couple thousands of variations, it takes at most a few minutes to compile, and it uses a unified lighting pipeline that also makes shaders smaller and easier to maintain...
Dishonored 2 is in the top 5 of my most favourite games of all time, I absolutely loved it!
I'd like to hear more about your shader explosion issue if you have the time, can you pinpoint what exactly caused it in the first place?
My general thoughs on this is that if you have a good decal system like in idTech5 then you can use decals most of the time when you need various surface variations and your art team doesn't have to create a new shader every time they want a slightly different scratch on the surface and they can just use decals for these small details.
Of course you still have gameplay-specific shaders, but there shouldn't be 50k of them right, Doom managed to ship with like 100-ish shaders in total and visually it looks amazing.
sounds like a self control problem, there is nothing about forward that says you have to use a billion shaders, deferred simply imposes the limitation
okay... _why_ are there thousands of shaders?
I Dishonored!!!
i wish @Patapom3 replied to this :(
tiled rendering with lots of onscreen lights was often touted as a benefit of deferred, but that relies on a false assumption about how light works, very rarely will you have lots of small point lights with a small radius, perhaps street lamps in a city like GTA. but most lights are going to throw far, likely across the entire screen in order to look natural. To keep the benefit of tiled rendering, light radius and falloff are artificially capped to a small radius and a harsh edge, which looks very artificial, if combined with gamma correction it will make the falloff look even harsher and even more unnatural. really very few actual scenes fit the way the tech demos with hundreds of lights were constructed.
You're messing up tiled/clustered shading with deferred rendering. These are completely different techniques which do not contradict each other.
@@charlieking7600 I'm not. Tiled rendering is often presented as a benefit of deferred. Most deferred rendering tech demos have hundreds of point lights to show it off.
@stysner4580yes i also heard about that tiled cluster on forward++
@@doltBmBIf you spend 2 minutes searching it up you can figure out it's forward rendering. Really proud of yourself for spreading misinformation huh?
Me when I spread misinformation
Solid presentation! Looking forward to more.
Great presentation. There's a small typo on the slide "Optimized deferred renderer" - Normal should be G32 and Emissive should be B32.
10:00
I've often felt like an industry outcast for being so largely against deferred rendering. I'm not saying it never has a place but I feel like studios now just default to it without understanding the substantial drawbacks.
I looked at Godot's documentation recently, and it seems they use a modified version of Forward+ rendering for performance reasons. It's still not a very popular engine though.
We can't even make a proper sky for VR in UE5 now. It's all "lumen lumen lumen nanite nanite" and such poo. They push everyone to try and make AAA games when UE4 was actually something for small developers and teams as well. We can't all be Ubisoft. I was working all day having a blast with UE4 and now with 5 I feel like I almost quit. Old techniques don't work well anymore and new techniques also don't work well...
You do not mention VGPR usage, and occupancy. These factors can play a big role in the comparison
Yeah VGPR pressure is generally higher in a forward renderer on average - this basically trades VGPR pressure for bandwidth and bandwidth is generally more valuable in my opinion, especially if you're porting to VR or mobile. I've seen a lot of deferred-to-forward ports where forward consistently outperformed deferred without any actual shader logic changes.
It's also worth noting that the added VGPR pressure is coming from the added light processing loop and lighting model evaluation, which is typically not that much overhead. Crazy complex node-based shaders are often a much bigger problem and they also impact deferred shading in a similar fashion.
Also in a forward renderer it is easier to manage and scale it down where needed, for instance with a shader LOD system that uses a simplified lighting model for distant objects or even having a simplified lighting for "non-important" objects on a low graphics preset - in a deferred renderer you will need to branch on a g-buffer flag to evaluate those and branching doesn't reduce the allocated VGPR, the compiler will still allocate for the worst case of both paths and it would impact the entire shading pass (assuming that you light everything in CS and not with drawing geometry that samples the g-buffer).
At the same time it's also easier to scale it up, you can have a more fancy and expensive lighting model for specific key objects in the game and they won't impact the performance of your entire shading pass.
On a side note, occupancy by itself is not the end goal, the goal is not to stall on memory loads and you have other tools to deal with that. Having high occupancy can even reduce your overall performance if you have multiple work batches invalidating the cache all the time.
I think you forget one important aspect of forward vs deferred: *depth pre-pass.* _Edit: which you mention at __24:57_ 😅
With a forward render, you really do need an initial "pre-pass" only for depth. Meaning you have to render ALL your meshes to the depth buffer (using an empty fragment shader). This incurs an extra cost.
While with a _deferred_ renderer you don't really need one. Not really. I mean, you CAN if you wanted to, but it's often not used.
If you choose a forward renderer and you skip the depth pre-pass (and you don't sort your meshes), you risk running (potentially) expensive lighting calculations for the fragments that the depth buffer will discard, thereby throwing away whatever work the GPU did. This is the entire reason why deferred shading was invented in the first place! 😄
I'm not necessarily saying that deferred is better.
It depends entirely on your scene. Is it a "corridor" type shooter? Is it an open-world scene with lots of things behind other things? How "deep" is your scene, and can you cull those meshes on the CPU (or GPU)? These are the questions you should be asking when trying to choose between forward or deferred. Another important questions is: how many lights do you want? Because deferred easily supports _tens of thousands_ since they're basically like stamps drawn in 2D on the color buffer.
This was discussed in the video, 2 pass forward is generally faster than deferred. The cost of having an extra depth pass in a modern optimized forward renderer is lower than the cost of using that much extra bandwidth in a deferred renderer, especially so if you use the "stamps in 2D on the color buffer" approach - this will waste a metric ton of bandwidth. In order to for a deferred renderer to compete with forward at all you need to use a tiled deferred shading with a compute shader that caches gbuffer samples in TGSM.
现在我们的图形程序开始着手准备混合渲染管线,在一些标记的物体上使用前向渲染而不绘制到Gbuffer ,并在绘制灯光之后,在把前向渲染pass绘制一遍
Great video, Kirill. I really appreciated your deep dive into the complexities of deferred and forward renderers. But I also have a question:
I'm eager to hear your detailed perspective on the use of depth prepass, especially in the context of intermediate-based GPUs.
In recent discussions, especially on platforms like Twitter/X, there seems to be a divided opinion among GPU developers regarding its necessity. Some argue that the benefits have become too marginal to justify its use even on current intermediate-based GPUs even on forward rendering, and point out the fact that it is disadvantageous on tile-based GPUs due to the inherent principles of tile-based rendering utilized in mobile GPUs (which is out of the question anyway, since it is a fact).
On the other hand, there is a section of the GPU development community that staunchly supports the use of depth prepass on immediate-based GPUs, emphasizing its role in minimizing or even avoiding overdraw during each subsequent geometry rendering pass. They argue that it fosters a more efficient rendering process.
Personally, I align with the latter viewpoint. I find that using depth prepass is not only beneficial for reducing or avoiding overdraw but also plays a crucial role in enhancing features such as SSAO, early motion vectors, and so forth, before the actual main geometry pass. In my own engine (as a part of my PasVulkan project), which uses the Forward+ rendering technique coupled with depth prepass, bindless all-in-one-go-per-pass techniques and so forth, I have observed significant benefits in this matter.
I'm curious to know where you stand on this matter and would appreciate your detailed opinion.
If you are not running on a tile-based deferred GPU (TBDR) then you kinda need a depth prepass in a forward renderer, you can try turning it off every now and then and see what happens (I've tried it a few times and it always was a big performance hit). Keep in mind that not all tiled GPUs are TBDR - there are also tile-based immediate GPUs (TBIR) and they DO need a depth-prepass.
Personally I'm not a fan of doing early motion vectors in a depth prepass, you are losing the benefits of a fast depth-only rendering if you do that. Like I've mentioned in the video, a true depth-only pass without a pixel shader is a fraction of the cost because GPUs have a separate way of handling that and it gets disabled if you add a pixel shader to write motion vectors.
To make the most use of the depth prepass you should try rendering all opaque objects first without a pixel shader, then render everything with an alpha test on top of it. Alpha-tested geometry won't get that fast rendering either but there's no way around it, if you need alpha testing you pay for it.
@@bazhenovc754 Okay, I should have explained that better. 🙂After my depth-only buffer depth-prepass without color-writes, a sequence of few draw calls (or just one big multi indirect) with activated color-writes (for motion vectors and normals) directly follows, but still in the same "depth-prepass" renderpass from the view of my frame/render-graph implementation. For my engine, I took a lot of design ideas from Doom Eternal.
@@BenjaminRosseaux Yeah that makes sense, I misunderstood what you initially wrote. You could write motion vectors from the main forward pass too, I'm not sure if it's very useful to have them early.
I'm curious how you feel about visibility buffers, it is fundamentally deferred shading with 32bits per pixel (64 if we're counting depth) and does also address some of the flexibility concerns since you can make different pieces of the screen draw with different shaders, textures, attributes, etc.
The visibility buffer has its own pros and cons and it's probably a good topic for a separate video. Overall I like it, but I think that most games won't really benefit from it unless the art direction wants extremely dense and detailed geometry.
Visibility rendering shouldn't be considered forward or deferred. It's really just a different third thing
Hello! Decided to visit this presentation again to set up a proper Forward Renderer for our upcoming project. I'm wondering what's the advantage of using RGB32_UINT instead of 3 separate targets each with specific format (R11G11B10_F, R16G16_(F|SNORM), R32_UINT)? How alpha blended is performed into such render target? How bilinearly sample from it? Does it make reading from such target less cache efficient? Seems like I'm missing some crucial knowledge.
You will not get correct results either way, gbuffer normals are not color and if you use alpha blending on that you will get random results and it will look bad.
This is the same issue with deferred shading as well - you can't alpha blend normals and material properties, that data is not color.
So with that regard, the texture format doesn't matter that much. If you need bilinear sampling you can split some stuff into separate filterable render targets, but try to keep the overall bandwidth under control (i.e. 128 bits in total, use 3 MRT 32bits color filterable, 32bits normal filterable, 64bits packed non-filterable) but keep in mind MRT is going to be a bit more expensive even if the overall bits per pixel is equivalent. Same goes for alpha blending, if you have a ton of translucent geometry it might make sense to keep the color separate with alpha-blendable format.
For transparency you essentially have 2 options:
1 - separable blending, color target uses alpha blending, all other targets replace the data. This will make the color appear blended, but AO will be using the translucent front surface as input and AO won't be visible behind it.
2 - only use alpha blending and don't touch gbuffer normals at all. This will make the color appear blender, but AO will be using the opaque surface behind it as input - AO will be visible behind the translucent surface, but won't be applied to the translucent surface itself.
Same goes for DOF and other that uses gbuffer inputs in one way or another.
Depending on your game, you can use either 1 or 2 or both - the difference will be in how these things interact with post processing (different types of artifacts). Experiment with that and find the right combination that works for your game.
Thanks for the presentation. Am not a graphics programmer but an Unreal Engine noobie learning tech art. I noticed that when I enabled forward renderer + fxaa, the visuals improved greatly plus I got about 50 to 60 fps running a basic level on a Ryzen APU. Am glad to see forward renderer being recommended despite what the unreal docs say.
They've added a forward renderer because their deferred renderer didn't work in VR (too slow), it's pretty good but there are some features missing or not supported.
@@bazhenovc754I always assumed forward rendering has always been there even before UE 4
For mobile...At least mobile forward support custom data which is key to stylized/toon renderer, while mobile deferred is not@@bazhenovc754
Shame you didn't touch on topic of having lots of cut out geometry, like rendering forests. You can't just not use pixel shader on those, although you could set color mask to 0
I know that unity's book of the dead demo still had massive benefit from using prepass even with deferred rendering. Still, I also read in "Geometry Pipeline in CallOfDuty" that forward kinda sucks for cut out geometry?
You only need to render prepass with the alpha clip enable, but render it after all opaque. Then the main pass can be rendered fully opaque with earlyZ, the depth buffer will naturally have gaps where texture alpha was 0.
The prepass with a pixel shader will be more expensive and I don't think this is a solvable problem. You can optimize geometry, i.e. www.humus.name/index.php?ID=266 to reduce that overhead.
Awesome presentation. This will gonna help on my hobby game engine.
CPU frustum culling is damn cheap but draw calls aren't cheap so unless your system is 100% running on GPU,there is no reason to have GPU frustum culling.
Indirect draw calls are a prerequisite here. You basically have an array of draw call structures and a counter. A compute shader does the culling test and, if passed, atomically increments the counter and writes the draw call values. Then it's submitted for execution using a SINGLE cpu call (e.g. ExecuteIndirect on DX12)
Very information-dense presentation, thank you!
Thanks for great in-depth material man! One day I might be able to grasp everything you say but not today unfortunately.
What do you think about deferred shading on modern mobile devices when having open world 3D environment but with only few local lights? Would your advice still be so radical against deferred?
Thanks.
Yes I would 🙂Bandwidth is very expensive on mobile.
do you happen to have a link for that doom eternal presentation? (is it a talk, or only a set of slides?)
The link is in the description: advances.realtimerendering.com/s2020/RenderingDoomEternal.pdf
@@bazhenovc754 I saw that after a while, it is not a very helpful pdf :(
There should be a talk somewhere
Очень интересная презентация. Спасибо, есть о чём подумать.
Allowing for custom BRDFs in Forward+ is greatly underrated. No serious graphics programmer should be happy with Disney-GGX'ing everything like most modern deferred games.
You also missed talking about bloom and AO, which are everywhere and big enough to mention in the pipeline overviews IMO. True forward -> SSAO is bad as lacks normals, needs Depth+Normals. Deferred = already guaranteed everything for it.
20:27 "Forward shading still produces a g-buffer".
No, it doesn't. There's no "g-buffer" _at all_ in forward rendering. 🤨 Are we talking about the same thing?
He's saying that you need to produce TAA motion vectors and SSR normal/metalness/roughness. That's a G-buffer.
I guess it talks about the depth prepass, that forward rendering may rely on to prevent overdraw, it may include also any additional data for further effects, but overall, forward shading means it does all the lighting calculation per mesh, not per light, on a diferent pass.
G-buffer is needed for some post processing.
@@Asinsful You can obtain a position buffer and a normal buffer from the depth buffer, you don't need 2 other full screen textures.
@@zdspider6778 but you can't have texture normal maps that way.
Interesting talk, thank you for sharing it!
When did people start calling them "deck-ullz"? It's always been "dee-kal" where "kal" rhymes with "pal" or "Cal" in "California". This is the second time this week I've seen a gfx programmer say it "deck-ull" on RUclips. It's even "dee-kal" on dictionary sites (literally) and hundreds of older videos. Why is the pronunciation changing suddenly? Have people just only seen the word any never heard it said before somehow, in this day and age of streaming video? It reminds me of watching game reviewers saying "lin-neer" for the word "linear" because they've only read it and never heard it, and they see the word "near" in it and automatically assume it's a 2-syllable word instead of the 3-syllable word it's always been since before they came along and mis-read it.
I used both "deck-ull" and "dee-kal" pronunciations in the video to offend as many gfx programmers as possible.
22:11 _(regarding SSR)_ "You cannot do this with deferred shading because you need the normal and roughness".
I think you got that BACKWARDS, chief. 😆
You can't do that in a *_forward_* renderer, because you don't have the roughness as a separate render target. But you totally CAN with deferred! "Deferred" means putting it off (or "deferring" it) to a later date during the frame rendering timeline, so that you do the (potentially) expensive lighting calculations _ONCE_ per pixel.
You need to work on your listening comprehension, chief 🙂You are taking phrases out of context and constructing an imaginary windmill to attack, the phrase "cannot do this" does not refer to SSR and I explained earlier in the video what a forward renderer is in the context of this presentation.
Alien isolation runs on my toy laptom, doom 3 / quake 4 doesnt run on my toy laptop. Why? I know that alien isolation uses deferred rendering and Doom 3 and Quake 4 do some stupid shit.
My laptop is 0.3 TFLOPs and 128VRAM.
i don't like deferred shading because i see it is verbose,although it might be more efficient.i,'m making hard decision to do or don't implement deferred shading because it also change my shader alot.
Are AMD's GPUs really _that_ unpopular? o_O
Can I post this video to bilibili? I'll cite the source
Yes, I have no problems with that.
@@bazhenovc754 Thank you very much, this video is very enlightening
@@bazhenovc754 Thank you very much, this video is very enlightening
This talk clearly targets immediate GPUs. You make a lot of statements that are hard to defend in Tiled-Based architectures.
Can you elaborate which statements? I admit that most of my experience is with consoles and PC, although I did work a bit with various mobile GPUs.
There is TBDR and TBIR, you don't need a depth prepass on a TBDR because the HW already deals with overdraw for you with FPK/HSR, so a depth prepass is just not doing anything useful. But you still need it on a TBIR if you have a forward renderer, all my other arguments should still apply though?
I know that you can try to keep the G-Buffer in the tile memory and that mitigates a lot of bandwidth issues, but if you need the G-Buffer for postprocessing (SSR or anything that needs arbitrary sampling of the render target) then you're forced to move it off-tile anyway or scale down and make other concessions (same issue with an on-tile depth buffer - if you have anything going on with your postprocessing you can't really keep it on-tile because you'll need it later). Not to mention that on-tile memory is a scarce resource and there might not be enough of it for a fat G-Buffer.
So I still don't see any distinct advantages of deferred renderer over the forward renderer even on TBDR - on these types of GPUs forward renderer at worst shouldn't perform worse than a deferred renderer but it comes with a lot of flexibility. And from what I've seen, all TBDR vendors recommend using a forward renderer anyway.
Hi @@bazhenovc754, great talk by the way, I forgot to mention in my last comment xD.
As you pointed out, things like mobile GPUs having some sort of HSR which makes the depth prepass suboptimal, but you still need the depth for things like the SSAO mask generation. To be able to stay in tile memory (you should be able to get more than enough for the GBuffer data) mitigates a lot of bandwidth, not entirely as you will still have to end up writing once to let later passes use the GBuffer but at least you will stop doing multiple writes/loads. Also, some of the comments focus a lot in the assumption of Compute for everything, which is true for Desktop and Console territory but overall on mobile maximizing the use of fragment shaders is still the general advice.
So, overall I agree a Forward renderer, as long as you have a really limited amount of lights is going to be the most efficient approach. Unfortunately, modern graphics tend not to align well with this approach as you might need access to material information later in the frame (i.e screen probes based GI techniques, reflections ...) which force people to end up having to write some sort of GBuffer anyway.
Funny enough I’m actually using deferred rendering to upgrade an old game called gunz the duel. I’d rather used forward+ but not willing to upgrade the games entire engine from direct3d9 😂
I tried it but I am having a problem I am in a decent pc but I can't enable it I have added a PBR shader and also it has DX12
Actually, you should never use TAA
Ahaha, dude never heard about tiled rendering architecture.
Everything he says here is only relevant for immediate mode architectures. So Use deferred for tiled architectures.
@@homematvejif you have any kind of complex post processing, you are going to move the gbuffer off tile, with forward rendering this is 2 render targets less than deferred. The post processing is unchanged in both cases.
By “complex” I mean effects that need to access adjacent pixels - any kind of antialiasing for example. Sure you can run simple stuff like tone mapping on tile, but then your AA quality will suffer if you run it post tonemap.
I can assure you that I’ve heard about tile based architectures and I can also point you to vendor documentation that clearly recommends forward shading - community.arm.com/arm-community-blogs/b/graphics-gaming-and-vr-blog/posts/killing-pixels---a-new-optimization-for-shading-on-arm-mali-gpus
You need to learn more about the hardware, TBDR in particular.
Well this isn’t good as Minecraft is gonna be using deferred rendering 😢
Minecraft bedrock
what about Foward+ is this technique better than Forward+?
The kind of forward renderer discussed in this video would be firmly in the “Forward+” category already. Depth/normal/etc prespasses, occlusion, clustered lighting and decals… all of these are “forward+” things
Nice video but disliked for the horrific microphone, so much breath.
I bought a new expensive one, next video should be better.