@@Acerola_t It looks like you are living in a bunker hiding from the AAA cartel 🤣🤣🤣 Amazing video though, currently studying Software Engineering and will take a specialization in game dev, I am sure these video's will be helpful :D
Some things you can do to further optimize this,Instead of having a single blade per unit of grass have multiple, This means you have significantly fewer objects to render. Also over distance start randomly calling The shorter blades, they are very unlikely to be seen so you can just remove those. This removes the harsh edge as well because we are slowly fading in the grass instead of there being a harsh line all of a sudden. Also put most of your detail at the top of the blade and less at the bottom, we’re not very likely to see the bottom of the blade because of the blades in front of it. Greets someone who spent too much time on game foliage.
You can use a division structure(think 1 chunk has 4 sub chunks that each have 4 sub chunks) then frustum cull the chunks instead of individual blades for way faster culling.
Yeah I wanted to hear them talk about the culling method more, they didn't really discuss it at all. I am currently puzzled how they updated its buffer every frame for millions of blades of grass.
How hard would it be to do an occlusion culling on the vertexes that are not visible because they are hidden behind other blades of grass? Also it would look better if the grass on the edge inclined from opposite to camera to normal instead of just becoming visible. Or dithering if you want something a bit less expensive.
You actually don’t even need the whole mesh past a few meters. Instead of swapping it out for an entire lower mesh count model, you can just plug in the grass tops, on top of this past a certain distance you don’t even have to apply a skew, as you can only see the tips just revert back to a simple lateral translation.
Have you thought about replacing far-away grass clumps with the first method (quad+texture)? This would probably work way better with the GPU instance positions being relative to the camera. Great vids btw :)
This is similar-ish to what I did several years ago on a shipped game. I used a precomputed quadtree rather than chunks because the area of coverage was known and the frustum test was just iterating over the blades of the visible nodes and copying them to the final instance buffer (no scan and compact, just directly from one to the other). It populated an indirect draw structure at the same time so the whole thing was essentially two calls (compute dispatch + indirect draw). Another way of hiding the end of the grass is to reduce the height of the blades in the vertex shader as they approach the far distance. If your grass is a similar color to the terrain you can't see where it ends and then you also don't need fog. A decent memory savings was also only place one tile of grass blades and treat the chunks as an instance of that base tile and then expand things out in the visibility pass.
Thank you chunks! I have no idea how hard it would slow things down but one subtle thing that makes grass look like grass is that it is not just darker at the bottom than the top, but also darker in the middle than at the edge and darker on one side than the other. You definitely don't see this shading nearly as often as the kind you did, but I have no idea whether it's too slow, or if it's just not something people think about.
Great video. I see that other people have already mentioned using billboards at distance, but there are some other, intermediate optimizations that I think you could use as well. Just as a thought experiment, I know you're no longer working on this. - The GDC talk on the grass in Ghost of Tsuhsima widens the grass at further distances while drawing fewer blades. They make the blades twice as wide for half as many blades. While they generate their grass blades in the geometry shader, this should still be possible with the 3d models you are using in the vertex shader. - The same GDC talk mentions how they use a grass model that forms a V, which gives them more coverage with fewer blades. It will make the grass look different, though. I read through the shader code for the grass and there's a number of smaller optimizations that could be employed. I don't know if they would actually do any good, and the compiler may even do them for you, but they could be worth trying out just to see. - You use "v.uv.y * v.uv.y" three times in the vertex shader, twice in "_Scale * v.uv.y * v.uv.y". If you put them into a temporary variable the GPU would keep them in a register rather than computing them every time they're needed. - RotateAroundXInDegrees and RotateAroundYInDegrees are fairly expensive functions, but much of that stuff can either be pre-computed or computed once (per vertex). RotateAroundXInDegrees is only called once, with a constant value passed into "Degrees". You can store 'm' as a constant since it never, ever changes. RotateAroundYInDegrees is similar, the m for idHash * 180.0 only needs to be computed once per vertex. - You can inline the RotateAround_InDegrees functions to avoid the overhead from function calls. - floating point arithmetic is non-associative, which means optimizations that may be obvious won't be performed by the compiler because you won't get exactly the same answer. UNITY_PI / 180.0 should just be a constant, but in order to tell the compiler that you have to put it in parathesis, or just #define a constant. - If you put positionBuffer[instanceID].displacement into a temporary variable, you'll be telling the compiler to load the value into a register rather than reading from cache, or worse, vram, every time it needs it. - There are likely some code motion optimizations to be made, particularly in concert with the above, but I don't know enough about the hardware or compiler to really give many good recommendations for it. I will say that putting all of the positionBuffer[instanceID] calls next to each-other improves temporal locality and may have an effect on performance. As I mentioned, a lot of this stuff may not even help. It's all very much "try it and see". The compiler may already do most/all of this already and so the end result is just making your code impossible to maintain to get an extra 3 fps. There's also Amdahl's law. I may be focusing all of this attention on some parts of the procedure that take up 1/10 of the actual execution time.
With all this new knowledge on video game grass (which is not something I expected life to bring to me but I'm not mad) I feel the urge to make an open world that's just a grassy field. A really really nice grassy field.
A thought occurred to me when culling and chunks were mentioned. Would it be possible to further optimize by reducing the number of grass blades relative to distance each chunk is from the camera? While this may not reduce overall GPU load considering the extra calculations it could at the very least extend the range of which grass is visible, while simultaneously allowing for a gradient fall-off, which eliminates the need for a fog that could obstruct distant objects like mountains or buildings.
That'd probably result in it look noticeably sparse in the distance - which could then be fixed by making the blades thicker, similar to what mipmapping does to transparent textures. Alternatively, billboards in the distance, funny blades up close.
@@AlphaGarg i think zelda botw has this, althought its not that noticeable nor is it too cumbersome given the gradient-shader it has going on! I think this is a great idea!
Another option would be to replace the more detailed 3D grass with animated billboarded images. Or, you could replace the texture of faraway areas that have grass with a grassy texture. A mix of all of these would probably be desirable
@@myrealusername2193 They could use 1 tri polygons for the entire thing, the problem though is you need to enable alpha testing which is expensive. They could experiment with different configurations to see which is more optimal, such as high poly mesh with no alpha testing, vs. 1 tri mesh with alpha testing enabled. It might even depend on what gpu is used also.
This is great because you're optimising the grass based on the worst case scenario. I think the point you made at 7:35 is important, your actual game world is probably never going to have a scene with that much grass in a single frame. Come to think of it, the only area of BotW with that level of grass density is Hyrule field, but its important to note that there is almost nothing else populating Hyrule field.
awesome demo, in breath of the wild, the "high poly" grass is only two triangles, in an elongated diamond shape and it still looks good, I wonder how much of a performance boost that would give your version?
@@Acerola_t I’d imagine it’s much more worth it on a Switch due to its low power and only 4 gigs of shared RAM. Lower polygon counts would save a decent amount of RAM as well as adding performance.
@@myrealusername2193 The memory cost comes from the stored positions of each grass blade. The grass blade mesh itself is not an issue at all, especially when we're talking about a 1 or 2 tri difference.
@@Acerola_t hmm, would that be from only storing the mesh once and then drawing it at each position instead of storing an object with all of the geometry at every point? I don’t really have any experience with 3D graphics (all the games I’ve made are web games with the only animation being the occasional fade in/fade out or animated flags) so I don’t really know how exactly all the geometry and objects and things are stored in memory. I’d imagine it depends for each system, though there might be some standard way all shaders and engines generally use. I’m surprised that the positions take up so much memory though, are they stored as floats or something? Or like a 64-bit signed integer would likely take up a lot of space as well.
@@myrealusername2193 Buffering the offsets is 96 bytes (3 32 byte signed floats)* 7 million grass blades, so, ~700 Mb just for the offsets. The GPU does have options for lower precision floating point representations, but you're working with worldspace co-ordinates, so you would end up with grass looking extremely wonky after traveling a relatively short distance from origin due to each vertex getting subtly (or not so subtly) "shifted" by precision loss. You -could- try to manage that by introducing some kind of "origin shifting" logic, but I suspect you'd end up with some subtle bugs, and troubleshooting shaders is not a lot of fun because you can't really do stuff like step-based debugging on the GPU - there be dragons. Origin re-mapping is hard enough to manage on the CPU without bugs.
Thank you! Compute shaders are my favorite thing ever, they're so powerful. If you want some good resources for getting started with them I recommend: kylehalladay.com/blog/tutorial/2014/06/27/Compute-Shaders-Are-Nifty.html and blog.three-eyed-games.com/2018/05/03/gpu-ray-tracing-in-unity-part-1/
Also one way to reduce the memory usage would be to remove the cached displacement value and instead sample the height map in the vertex shader, but it would add 9 texture samples per grass blade (3 for LOD) so I think taking the memory hit is more ideal. Additionally you could probably bit pack the uv coordinates, but I'm not confident on that.
@@Acerola_tCan't you attach extra data to a mesh instance that'd get passed to the vertex shader? You'd only need to query the heightmap texture once then, on instancing. (Not a GPU guy, I don't know if I'm making sense.) Also, bit packing the coordinates seems very doable. Store a reference centre coordinate in full 32-bit floats for each chunk in a quad-tree, then apply a -1.0 to 1.0 uv-coordinate space "modifier" for each leaf of grass - with only a byte-size integer, you could index a 0.5m x 0.5m chunk to every ~2mm on each axis. Looks to me like you could get down to 2 bytes minimum, 4 if you need a bit more resolution (half-floats, if available, would work great), assuming the z axis can be omitted; if it can't, it also don't need more than 1-2 bytes, since the chunk can also define a reference average or middle offset.
I doubt it matters to say this at this point, but I think another option for covering up the fact that you're culling the far-away grass is to simply color the ground the same as the grass. At that distance, I doubt you would even notice that the grass stopped existing. If the animation of the grass changes the color of it enough to give it away, you might also consider applying an animation to the ground itself to match the change. I say this because although fog is very common is games, it makes sense to strongly consider other options before using it given how it destroys any hope of viewing landmarks or terrain in the distance. Standing on top of a hill and looking out to the surrounding terrain and having it fill you with a sense of awe and wonder is very desirable effect, and cutting corners elsewhere to achieve this effect is often worth it. Good video and I'll look forward to any new content.
No doubt someone has already said this, but you might be able to use a mesh with more blades of grass and decrease the density. Sure, each mesh would have more polygons, but you'd be keeping a track of way less meshes, no doubt making it cheaper on the GPU's VRAM. You could also put less blades of grass into the lower LOD meshes to make a smoother transition between the grassed and grassless areas. Oh and you could use billboards at a distance too. Not like anyone would notice.
Those would be great further optimizations, quite frankly the only reason there's so many meshes in the example is to demonstrate how fast the technique is at extreme numbers lol
This is cool, you could probably optimize it further by combining your billboard grass with meshed grass and using the billboard ones at longer distances.
@@kered13 I don't think dithering would work well. Zelda Breath of the Wild actually this combination of mesh and billboard grass I was talking about, so apparently I'm not the first to think of this idea
You could probably save a lot of that memory by having the grass procedurally generated, so the memory would only be used for a brief period of time, but then the procedural generation would have to be extremely fast in order to not hinder the rendering performance if calculated per-frame. Nice explanation of how it all comes together to make grass like this possible.
Hoping you follow this up by adding ghost of tsushimas grass stuff, rounded normal for the grass blades to hide that it's 2D, varying height, shorter glass blades are converted into 2 blades to to make use of the vertices and add density
Thanks for the video. In case it helps anyone, I was able to triple the framerate for the geometry grass shader simply by pulling in keijiro's latest noise include and using the 2D simplex version instead of the 3D snoise which is overkill since the Y isn't required.
You'd have to get much fancier to do that, since this stuff exists only on the GPU then physics aren't possible. For stuff like trail tracks you'd have to keep track of a global texture that overlays on the terrain and then you write to that texture as the player drives around, then the grass texture would sample from that texture to see if it has been smushed down or not.
Would be cool to see another intermediate LOD where it goes back to the textured quad approach for anything more than ~10m away. Would be near indistinguishable since it only looks bad when viewed from directly above, and unless you have an ortho camera that spot would be covered by the better grass. Might eliminate the need for fog.
I wonder if some of the optimizations used to render hair could also be used to render grass. Or if people working on grass optimization also used those to render better hair, fur, etc., because they're similar problems of "render a lot of this thing protruding from this other thing" except with hair the "other thing" may be moving.
I kept expecting at some point that you'd have a radius around the camera outside of which there weren't individual blades of grass, but just like a blob disguised as grass. Kinda like you did where you stopped drawing the grass altogether at a certain radius, but colored the same color as the grass rather than brown.
You can also look which grass is covered more than 3/5 and exchange it with the middle class model or if it is a bit further away even use the lowest model. And also gras which can't be seen because other grass blocks the view can be taken off alltogether but for this you have to take the hight and position into account.
Hi. I love your videos! I downloaded this grass project to play around with it a bit and understand everything in more detail. The grass culling is quite expensive. If I only cull every 5th frame, the FPS increases from ~270 to ~440, and if I don't cull at all, I can get to around ~480. At those frame rates I've found it's very difficult to see the lack of culling for 4 frames unless the camera moves/rotates extremely fast. If we can link the camera to the grass and only cull every second or third frame while the camera view is changing, we can actually get a considerable FPS boost.
damn bruh. Thats exactly the type of nerdy computer graphics stuff i dig. Im in no way close to coding my render engine, but i work in Houdini and instancing grass here is pain in the ass too.
I'm very happy I found this channel. Others are only really entertaining and fun I guess, but the real genius of game development is far more interesting. It also helps that your presentation is witty and concise. As a casual gamer and professional procrastinator I find this all super cool. :)
you can actually put a bit of sampling in the pixel shader, a bit like parallax occlusion mapping (Look it up, its a common method ppl employ) and u can actually develop a shadow from it, instead of the displacement map, and it will shadow it all into the distance! would definitely make it look cooler. Parallel occlusion mapping actually is the first raytracing type method ppl did on the gpu. Realtime.
This would be very useful for a mysterious, serene, foggy field in an experimental indie horror game which appears in two scenes for no apparent reason, and then places you back in the game, just to make you question whether what you're experiencing is *supposed* to be coherent or not.
Chunks are a blessing for both end-user and in-editor performance, and thank you for remembering that this thing called LOD exists, I was rendering entire scenes from far away off a city that didn't downscale the buildings and car models, first it wasn't a problem because there wasn't too much models or effects on the scene, but after sometime I was getting 30 fps in game and 50-60 in editor. I will try to implement LOD. Many thanks.
I know you said you wanted to be done with the grass, but I'm curious: is it possible to have objects (players and NPCs) effect the grass movement in a performant way? (flattening or making it shake as it's moved through) I've seen other tutorials on this for snow and grass using a displacement texture but I'd be curious to know if it works with this grass technique
Yeah the same technique is used for snow and grass. The player's position is written to a texture that overlays across the field and the shader then samples from that texture to inform itself if it should be flattened or not.
Perlin noise texture input for grash splat map and you have one helluva grass shader ^-^ Basically no real scene needs that much grass everywhere so yeah this is pretty great!
Didnt realize a video about John Lenon talking about the grass could be so fun and educational. But I honestly didn't think you'd Reincarnate till Yoko was gone.
Since the grass is now stored in chunks (I also saw someone suggesting using quadtrees for better, more fine-grained culling), you maybe don't need to store as much data on the leaves' positions. Let the chunk itself store its world coordinates; let the leaves' positions be defined relative to the chunk. That way, you might be able to store what's currently floats it as a 1-byte integer or something. I saw this technique used in a video about a voxel-based game engine, but I'm thinking that it could work in any scenario with a well-defined world grid of some sort being used.
Considering the number of grass increases exponentially with distance from the camera, you really want to cut down on the number you render at a distance as well. If you had a narrow band of low poly grass, then filled the following chucks with 2d decals you could probably save even more vram. Render every third point of grass with one decal with three grass straws could give a final performance boost.
@@drdca8263 Yes, you're probably correct. I didn't think through this comment before posting it. The main point is still true though, the last row of grass much harder to render than the closest one. So any cheap trick you can make in the distance could be worth it.
@@drdca8263If you're thinking in terms of exponential functions in fancy maths (and, relatedly, running time), sure. But I think in other instances, "exponential" simply refers to whenever there's a parameter being raised to some power. And we're talking about grass straws here. The number of those would be going up with the area of the circle defined by the view distance from the camera; pi*r^2. That could be said to be polynomial, but colloquially also "exponential".
@@mnxs I’m conflicted. On the one hand, I have had objections to people insisting that because people who study a particular topic use a certain word a certain way, that other people are also obligated to use the word that way rather than a different way. But, at the same time, I strongly prefer that everyone did not refer to things that grow polynomially as “exponential”. This is probably somewhat hypocritical of me… I guess it is because I think math is so great and i want everyone to know it? But like, people who study other fields probably feel the same way about their favored field.
What have you done to me ? Im obsessed with grass now.. Im supposed to be a gamer not some kind of.. real life human thing who goes outside or whatever that is.. Anyways great video lol, Im wondering about how you could fix the camera issue you were talking about using the original method tho
why would u store each chunk individually, when u can use a single buffer multiple times? Make the grass repeat along its edges, and read the buffer with a scalar value, corresponding to the chunk position. It would be slightly slower, but you wouldn't have the Vram issue. you will also have to change the culling of grass in a chunk based manner, instead of doing it per grass model in each chunk. basically stopping at 2:12 instead of 2:13. It might actually give you peformance, as sorting buffers can be generally expensive in terms of performance. Also what about reducing the grass range even further, and adding a granular post effect layer to the terrain instead? Far in the distance the grass looks like noise anyway, u just have to map the displacement wind noise to it in some way to make it look realistic. Also u would have to use the shader to lighting the visible terrain to sky edge. Would love to hear why my suggestions wouldn't work. Destroy me :D
Great series! I learned a lot. Could you save space by rendering multiple blades per position? For example, one position could have 3-4 off shoots. Also, not sure if this will make a difference, depending on hilly-ness your maps, the grass could skip rendering on the downslope side of the hill.
So in conclusion, this is very doable, but you need to keep that final cost in mind and weigh whether or not it's applicable to the project you're working on. Dualy noted. I was considering using this for a VRC world where the only other performance heavy things are 1, people's avatars, and 2, a video player. I'll have to see how much of an impact it will have with other level geometry loaded in.
What if you gave the ground after the cutoff distance the same color as the actual grass, and applied the same noise on top of it to make it look darker as if there is grass moving there too?
In the far off distance, all the grass blurs together. Is it possible to instead have a single surface that replaces the grass at some distance and have the texture be some function of color calculated by how much the grass should be bending in that region (from the effect of either wind, something in the field, length, etc etc)?
Thank you chunks! Could you reduce the memory cost if you store only the Ids for referencing the blades from the first buffer inside the second buffer?
Hi! I've been subscribed for a couple of days, and I'm loving your tutorials. You give just the right amount of detail needed to make it both fun to watch, easy to understand and follow along, I think you'd be a great teacher :) I'd like to ask something though, if you don't mind. What technologies/programs/ides do you use in your videos? You inspired me, and I want to start developing a game, just for fun and to see what happens! :D
I had idea... You talked about lod, but what if at distance it would be one big mesh? Like big mesh? You could actually do that, even in order: 1. best grass 2. lod'd grass 3. one big mesh with color of grass. It is very obvious from looking that at the distance all the grass looks like just one color... You could abuse that ig...
What if you used this chunking method and still, after a set distance, transitioned to the GPU instanced grass from the first video for your LOD grass? Would that help performance gains? Would there be a noticeable line at the cut off and, if so, could it be mitigated?
Unity docs say "don’t use GPU instancing for meshes that have fewer than 256 vertices." Maybe it's more performant to batch groups of blades as a single model right inside the compute shader and gpu instance those groups instead.
As someone who has worked on serious games/milsims grass is a fuckng nightmare because you have an expectation, especially in multiplayer environments, that if you are laying prone in grass, then people far away from you shouldn't be seeing grass de-render and you're soon to be corpse laying on a flat plane totally exposed. Honestly still no good solutions to this day.
Regarding the memory efficiency, I believe UV and perhaps even displacement are unnecessary. UV should just be the same as Position.xz (unless Position is in camera space, in which case you could at worst introduce a camera pos uniform) and for the vertical displacement, it should be a solution to take like VertexID % 7 and use that to index the height in a constant array. (This is assuming that you have 7 vertices on a grass blade, you could insert a different number)
How feasible would it be to use a geometry shader to generate the positions of the grass as well based on what's visible in the frustum in the first place rather than sending the initial position buffer to the GPU. Like for each chunk you'd generate N positions for grass blades (pseudo-randomly but deterministic so it's the same every frame) using the height map displacement of the ground to get the base position, and from there generating the actual grass geometry. You'd cut about half the memory usage, but at the cost of more processing. Also now I'm wondering if it would be possible to do this as a separate pass entirely using deferred rendering somehow, with a masking pass to mark where grass blades are supposed to go (probably wouldn't work because obviously some grass has to start off screen, though extended buffers might fix that? I don't know, I'm completely spitballing here)
@@Acerola_t - This is also keeping the grass in memory even when it's culled though, right? I wonder how effective it would be to cull each chunk on the CPU, and swap out non-visible chunks on the fly. It would mean swapping in 250k grass blades each time a chunk came into view, so I'm not sure how much of a performance hit that would be (and probably pretty bad when quickly moving the camera). I'm going to be doing something similar to this soon-ish, so I'll try some things.
Did you take any inspiration from the Sucker Punch GDC talk perhaps? They have pretty similar solutions for their grass rendering in Ghost of Tsushima.
Your grass has been hitting the algorithm and for some reason I keep clicking. Could you add a license to the github so we have explicit permission to reference/reuse the code in your repo and so we know exactly what you are okay with us using it for? I might make a more basic and optimized version of this for a game I am currently working on but you technically are not allowed to copy repo code and use it in production unless there is a license saying that you can.
Thank you chunks :)
my mans out here sharing the secrets of AAA grass like it's no big deal. great work man, you're clearly very knowledgeable.
Haha thanks! My intent is to show how real world game assets might work.
@@Acerola_tyou are the savior of indie shitters like myself
Idk man, AAA games run like shit with no regard to optimization so this is probably better than whatever they're doing.
@@friendofp.24No, this is exactly what they're doing
@@Acerola_t It looks like you are living in a bunker hiding from the AAA cartel 🤣🤣🤣
Amazing video though, currently studying Software Engineering and will take a specialization in game dev, I am sure these video's will be helpful :D
Some things you can do to further optimize this,Instead of having a single blade per unit of grass have multiple,
This means you have significantly fewer objects to render.
Also over distance start randomly calling The shorter blades, they are very unlikely to be seen so you can just remove those.
This removes the harsh edge as well because we are slowly fading in the grass instead of there being a harsh line all of a sudden.
Also put most of your detail at the top of the blade and less at the bottom, we’re not very likely to see the bottom of the blade because of the blades in front of it.
Greets someone who spent too much time on game foliage.
Too much time on foliage? No such thing.
@@AvarFeralfang except when its rendertime ;)
@FreekHoekstra fair enough. I do like nice foliage, though. 😀
I also thought about culling "randomly" at distance. Shorter grass first, but also with some noise to gradually cull less rather than stop abruptly.
@@RecOgMission I don't know if that's what they use in games but this is actually a great idea
You can use a division structure(think 1 chunk has 4 sub chunks that each have 4 sub chunks) then frustum cull the chunks instead of individual blades for way faster culling.
that's a genius idea lmao
After your comment i wasted whole day researching different math topics. Thank you, it was interesting.
Yeah I wanted to hear them talk about the culling method more, they didn't really discuss it at all. I am currently puzzled how they updated its buffer every frame for millions of blades of grass.
How hard would it be to do an occlusion culling on the vertexes that are not visible because they are hidden behind other blades of grass?
Also it would look better if the grass on the edge inclined from opposite to camera to normal instead of just becoming visible. Or dithering if you want something a bit less expensive.
Isn't that what a quad-tree is? Or is that something different?
You actually don’t even need the whole mesh past a few meters. Instead of swapping it out for an entire lower mesh count model, you can just plug in the grass tops, on top of this past a certain distance you don’t even have to apply a skew, as you can only see the tips just revert back to a simple lateral translation.
Have you thought about replacing far-away grass clumps with the first method (quad+texture)? This would probably work way better with the GPU instance positions being relative to the camera. Great vids btw :)
Or simply not rendering them and just having a tan-colored background that looks like the grass in the distance.
This is similar-ish to what I did several years ago on a shipped game. I used a precomputed quadtree rather than chunks because the area of coverage was known and the frustum test was just iterating over the blades of the visible nodes and copying them to the final instance buffer (no scan and compact, just directly from one to the other). It populated an indirect draw structure at the same time so the whole thing was essentially two calls (compute dispatch + indirect draw). Another way of hiding the end of the grass is to reduce the height of the blades in the vertex shader as they approach the far distance. If your grass is a similar color to the terrain you can't see where it ends and then you also don't need fog. A decent memory savings was also only place one tile of grass blades and treat the chunks as an instance of that base tile and then expand things out in the visibility pass.
oh damn, reducing the grass height with distance is so smart.
Thank you chunks! I have no idea how hard it would slow things down but one subtle thing that makes grass look like grass is that it is not just darker at the bottom than the top, but also darker in the middle than at the edge and darker on one side than the other. You definitely don't see this shading nearly as often as the kind you did, but I have no idea whether it's too slow, or if it's just not something people think about.
Great video.
I see that other people have already mentioned using billboards at distance, but there are some other, intermediate optimizations that I think you could use as well. Just as a thought experiment, I know you're no longer working on this.
- The GDC talk on the grass in Ghost of Tsuhsima widens the grass at further distances while drawing fewer blades. They make the blades twice as wide for half as many blades. While they generate their grass blades in the geometry shader, this should still be possible with the 3d models you are using in the vertex shader.
- The same GDC talk mentions how they use a grass model that forms a V, which gives them more coverage with fewer blades. It will make the grass look different, though.
I read through the shader code for the grass and there's a number of smaller optimizations that could be employed. I don't know if they would actually do any good, and the compiler may even do them for you, but they could be worth trying out just to see.
- You use "v.uv.y * v.uv.y" three times in the vertex shader, twice in "_Scale * v.uv.y * v.uv.y". If you put them into a temporary variable the GPU would keep them in a register rather than computing them every time they're needed.
- RotateAroundXInDegrees and RotateAroundYInDegrees are fairly expensive functions, but much of that stuff can either be pre-computed or computed once (per vertex). RotateAroundXInDegrees is only called once, with a constant value passed into "Degrees". You can store 'm' as a constant since it never, ever changes. RotateAroundYInDegrees is similar, the m for idHash * 180.0 only needs to be computed once per vertex.
- You can inline the RotateAround_InDegrees functions to avoid the overhead from function calls.
- floating point arithmetic is non-associative, which means optimizations that may be obvious won't be performed by the compiler because you won't get exactly the same answer. UNITY_PI / 180.0 should just be a constant, but in order to tell the compiler that you have to put it in parathesis, or just #define a constant.
- If you put positionBuffer[instanceID].displacement into a temporary variable, you'll be telling the compiler to load the value into a register rather than reading from cache, or worse, vram, every time it needs it.
- There are likely some code motion optimizations to be made, particularly in concert with the above, but I don't know enough about the hardware or compiler to really give many good recommendations for it. I will say that putting all of the positionBuffer[instanceID] calls next to each-other improves temporal locality and may have an effect on performance.
As I mentioned, a lot of this stuff may not even help. It's all very much "try it and see". The compiler may already do most/all of this already and so the end result is just making your code impossible to maintain to get an extra 3 fps. There's also Amdahl's law. I may be focusing all of this attention on some parts of the procedure that take up 1/10 of the actual execution time.
With all this new knowledge on video game grass (which is not something I expected life to bring to me but I'm not mad) I feel the urge to make an open world that's just a grassy field. A really really nice grassy field.
A thought occurred to me when culling and chunks were mentioned. Would it be possible to further optimize by reducing the number of grass blades relative to distance each chunk is from the camera? While this may not reduce overall GPU load considering the extra calculations it could at the very least extend the range of which grass is visible, while simultaneously allowing for a gradient fall-off, which eliminates the need for a fog that could obstruct distant objects like mountains or buildings.
That'd probably result in it look noticeably sparse in the distance - which could then be fixed by making the blades thicker, similar to what mipmapping does to transparent textures.
Alternatively, billboards in the distance, funny blades up close.
@@AlphaGarg i think zelda botw has this, althought its not that noticeable nor is it too cumbersome given the gradient-shader it has going on! I think this is a great idea!
@@AlphaGarg I was thinking about that just as I finished watching the video.
Another option would be to replace the more detailed 3D grass with animated billboarded images.
Or, you could replace the texture of faraway areas that have grass with a grassy texture.
A mix of all of these would probably be desirable
@@myrealusername2193 They could use 1 tri polygons for the entire thing, the problem though is you need to enable alpha testing which is expensive. They could experiment with different configurations to see which is more optimal, such as high poly mesh with no alpha testing, vs. 1 tri mesh with alpha testing enabled. It might even depend on what gpu is used also.
This is great because you're optimising the grass based on the worst case scenario. I think the point you made at 7:35 is important, your actual game world is probably never going to have a scene with that much grass in a single frame. Come to think of it, the only area of BotW with that level of grass density is Hyrule field, but its important to note that there is almost nothing else populating Hyrule field.
I was looking into how to make grass and I did *not* expect to get a full video series on it. Thanks for this super handy resource!
Be sure to read the other comments for more potential optimizations! I def have a lot more I could do
I found your channel recently and all I can say is, it's so knowledgeably chill. Thanks
awesome demo, in breath of the wild, the "high poly" grass is only two triangles, in an elongated diamond shape and it still looks good, I wonder how much of a performance boost that would give your version?
Probably like 5-10 fps i think
@@Acerola_t I’d imagine it’s much more worth it on a Switch due to its low power and only 4 gigs of shared RAM. Lower polygon counts would save a decent amount of RAM as well as adding performance.
@@myrealusername2193 The memory cost comes from the stored positions of each grass blade. The grass blade mesh itself is not an issue at all, especially when we're talking about a 1 or 2 tri difference.
@@Acerola_t hmm, would that be from only storing the mesh once and then drawing it at each position instead of storing an object with all of the geometry at every point?
I don’t really have any experience with 3D graphics (all the games I’ve made are web games with the only animation being the occasional fade in/fade out or animated flags) so I don’t really know how exactly all the geometry and objects and things are stored in memory. I’d imagine it depends for each system, though there might be some standard way all shaders and engines generally use.
I’m surprised that the positions take up so much memory though, are they stored as floats or something? Or like a 64-bit signed integer would likely take up a lot of space as well.
@@myrealusername2193 Buffering the offsets is 96 bytes (3 32 byte signed floats)* 7 million grass blades, so, ~700 Mb just for the offsets. The GPU does have options for lower precision floating point representations, but you're working with worldspace co-ordinates, so you would end up with grass looking extremely wonky after traveling a relatively short distance from origin due to each vertex getting subtly (or not so subtly) "shifted" by precision loss.
You -could- try to manage that by introducing some kind of "origin shifting" logic, but I suspect you'd end up with some subtle bugs, and troubleshooting shaders is not a lot of fun because you can't really do stuff like step-based debugging on the GPU - there be dragons. Origin re-mapping is hard enough to manage on the CPU without bugs.
Really good video! This has made me really interested in learning compute shaders. I can't wait to see what topic you cover next.
Thank you!
Compute shaders are my favorite thing ever, they're so powerful. If you want some good resources for getting started with them I recommend:
kylehalladay.com/blog/tutorial/2014/06/27/Compute-Shaders-Are-Nifty.html
and
blog.three-eyed-games.com/2018/05/03/gpu-ray-tracing-in-unity-part-1/
@@Acerola_t Awesome thank you man
Also one way to reduce the memory usage would be to remove the cached displacement value and instead sample the height map in the vertex shader, but it would add 9 texture samples per grass blade (3 for LOD) so I think taking the memory hit is more ideal.
Additionally you could probably bit pack the uv coordinates, but I'm not confident on that.
@@Acerola_tCan't you attach extra data to a mesh instance that'd get passed to the vertex shader? You'd only need to query the heightmap texture once then, on instancing. (Not a GPU guy, I don't know if I'm making sense.)
Also, bit packing the coordinates seems very doable. Store a reference centre coordinate in full 32-bit floats for each chunk in a quad-tree, then apply a -1.0 to 1.0 uv-coordinate space "modifier" for each leaf of grass - with only a byte-size integer, you could index a 0.5m x 0.5m chunk to every ~2mm on each axis. Looks to me like you could get down to 2 bytes minimum, 4 if you need a bit more resolution (half-floats, if available, would work great), assuming the z axis can be omitted; if it can't, it also don't need more than 1-2 bytes, since the chunk can also define a reference average or middle offset.
Was just about the make some sexy grass for my game. Thanks for the tips and have a great day!
i genuinely love how much i learn from these videos and how entertaining and casual they are
I have never been so invested into a story in my life.
This is better then any show out there.
I doubt it matters to say this at this point, but I think another option for covering up the fact that you're culling the far-away grass is to simply color the ground the same as the grass. At that distance, I doubt you would even notice that the grass stopped existing. If the animation of the grass changes the color of it enough to give it away, you might also consider applying an animation to the ground itself to match the change. I say this because although fog is very common is games, it makes sense to strongly consider other options before using it given how it destroys any hope of viewing landmarks or terrain in the distance. Standing on top of a hill and looking out to the surrounding terrain and having it fill you with a sense of awe and wonder is very desirable effect, and cutting corners elsewhere to achieve this effect is often worth it. Good video and I'll look forward to any new content.
your videos are so good, it blew the power to my house. but i'm back and digging on the smooth sounds.
Thanks! glad your power has returned
No doubt someone has already said this, but you might be able to use a mesh with more blades of grass and decrease the density.
Sure, each mesh would have more polygons, but you'd be keeping a track of way less meshes, no doubt making it cheaper on the GPU's VRAM.
You could also put less blades of grass into the lower LOD meshes to make a smoother transition between the grassed and grassless areas.
Oh and you could use billboards at a distance too. Not like anyone would notice.
Those would be great further optimizations, quite frankly the only reason there's so many meshes in the example is to demonstrate how fast the technique is at extreme numbers lol
Thank you chunks (:
These videos are genuinely fascinating
This is cool, you could probably optimize it further by combining your billboard grass with meshed grass and using the billboard ones at longer distances.
This was my first thought as well. It might need some dithering to make the transition seamless, but I think it would work well.
@@kered13 I don't think dithering would work well. Zelda Breath of the Wild actually this combination of mesh and billboard grass I was talking about, so apparently I'm not the first to think of this idea
@@mariovelez578 I mean fitting between the full model and the billboard grass do that the transition between them isn't so sudden.
@@kered13 Yes, I know, but won't the dithering make it look grainy in that area?
You could probably save a lot of that memory by having the grass procedurally generated, so the memory would only be used for a brief period of time, but then the procedural generation would have to be extremely fast in order to not hinder the rendering performance if calculated per-frame. Nice explanation of how it all comes together to make grass like this possible.
Hoping you follow this up by adding ghost of tsushimas grass stuff, rounded normal for the grass blades to hide that it's 2D, varying height, shorter glass blades are converted into 2 blades to to make use of the vertices and add density
Thank you chunks!!!
This was a very nice 45 minute journey through grass. I love that you still respond to comments on nearly 2 year old videos.
Worked! What an absolute genius mad lad! Was so easy
Thanks for the video. In case it helps anyone, I was able to triple the framerate for the geometry grass shader simply by pulling in keijiro's latest noise include and using the 2D simplex version instead of the 3D snoise which is overkill since the Y isn't required.
Yeah I didnt bother optimizing the geometry shader grass because I'd rather do literally anything else
You're like Dani, but actually enjoyable to watch
Is there a way to make this grass interact with physics objects? Like leaving a tire trail with a car.
You'd have to get much fancier to do that, since this stuff exists only on the GPU then physics aren't possible. For stuff like trail tracks you'd have to keep track of a global texture that overlays on the terrain and then you write to that texture as the player drives around, then the grass texture would sample from that texture to see if it has been smushed down or not.
thank you chunks! what a kind fellow
Would be cool to see another intermediate LOD where it goes back to the textured quad approach for anything more than ~10m away. Would be near indistinguishable since it only looks bad when viewed from directly above, and unless you have an ortho camera that spot would be covered by the better grass. Might eliminate the need for fog.
This might have been the best video I've seen on anything shaders related.
Nice video, it looks amazing 💙
I wonder if some of the optimizations used to render hair could also be used to render grass. Or if people working on grass optimization also used those to render better hair, fur, etc., because they're similar problems of "render a lot of this thing protruding from this other thing" except with hair the "other thing" may be moving.
I kept expecting at some point that you'd have a radius around the camera outside of which there weren't individual blades of grass, but just like a blob disguised as grass. Kinda like you did where you stopped drawing the grass altogether at a certain radius, but colored the same color as the grass rather than brown.
You can also look which grass is covered more than 3/5 and exchange it with the middle class model or if it is a bit further away even use the lowest model. And also gras which can't be seen because other grass blocks the view can be taken off alltogether but for this you have to take the hight and position into account.
Hi. I love your videos! I downloaded this grass project to play around with it a bit and understand everything in more detail. The grass culling is quite expensive. If I only cull every 5th frame, the FPS increases from ~270 to ~440, and if I don't cull at all, I can get to around ~480. At those frame rates I've found it's very difficult to see the lack of culling for 4 frames unless the camera moves/rotates extremely fast. If we can link the camera to the grass and only cull every second or third frame while the camera view is changing, we can actually get a considerable FPS boost.
damn bruh. Thats exactly the type of nerdy computer graphics stuff i dig. Im in no way close to coding my render engine, but i work in Houdini and instancing grass here is pain in the ass too.
amazing video bro keep it up
Thanks!
I'm very happy I found this channel. Others are only really entertaining and fun I guess, but the real genius of game development is far more interesting. It also helps that your presentation is witty and concise. As a casual gamer and professional procrastinator I find this all super cool. :)
Awesome videos!
Thank you chuncks!!
you can actually put a bit of sampling in the pixel shader, a bit like parallax occlusion mapping (Look it up, its a common method ppl employ) and u can actually develop a shadow from it, instead of the displacement map, and it will shadow it all into the distance! would definitely make it look cooler. Parallel occlusion mapping actually is the first raytracing type method ppl did on the gpu. Realtime.
This would be very useful for a mysterious, serene, foggy field in an experimental indie horror game which appears in two scenes for no apparent reason, and then places you back in the game, just to make you question whether what you're experiencing is *supposed* to be coherent or not.
Chunks are a blessing for both end-user and in-editor performance, and thank you for remembering that this thing called LOD exists, I was rendering entire scenes from far away off a city that didn't downscale the buildings and car models, first it wasn't a problem because there wasn't too much models or effects on the scene, but after sometime I was getting 30 fps in game and 50-60 in editor. I will try to implement LOD. Many thanks.
Thank you, chunks! :)
underrated videos
Thank you chunks!
I know you said you wanted to be done with the grass, but I'm curious: is it possible to have objects (players and NPCs) effect the grass movement in a performant way? (flattening or making it shake as it's moved through)
I've seen other tutorials on this for snow and grass using a displacement texture but I'd be curious to know if it works with this grass technique
Yeah the same technique is used for snow and grass. The player's position is written to a texture that overlays across the field and the shader then samples from that texture to inform itself if it should be flattened or not.
Perlin noise texture input for grash splat map and you have one helluva grass shader ^-^
Basically no real scene needs that much grass everywhere so yeah this is pretty great!
hehehe "grash splat" I'm a dum
Didnt realize a video about John Lenon talking about the grass could be so fun and educational. But I honestly didn't think you'd Reincarnate till Yoko was gone.
Since the grass is now stored in chunks (I also saw someone suggesting using quadtrees for better, more fine-grained culling), you maybe don't need to store as much data on the leaves' positions. Let the chunk itself store its world coordinates; let the leaves' positions be defined relative to the chunk. That way, you might be able to store what's currently floats it as a 1-byte integer or something. I saw this technique used in a video about a voxel-based game engine, but I'm thinking that it could work in any scenario with a well-defined world grid of some sort being used.
Optimization is my favorite topic you cover
wow thx man
0:59 what does the kanji mean?
i don't know
Acerola obviously...
Considering the number of grass increases exponentially with distance from the camera, you really want to cut down on the number you render at a distance as well.
If you had a narrow band of low poly grass, then filled the following chucks with 2d decals you could probably save even more vram.
Render every third point of grass with one decal with three grass straws could give a final performance boost.
Not exponentially. The growth is only polynomial, and less than cubic.
@@drdca8263 Yes, you're probably correct. I didn't think through this comment before posting it.
The main point is still true though, the last row of grass much harder to render than the closest one. So any cheap trick you can make in the distance could be worth it.
@@drdca8263If you're thinking in terms of exponential functions in fancy maths (and, relatedly, running time), sure. But I think in other instances, "exponential" simply refers to whenever there's a parameter being raised to some power. And we're talking about grass straws here. The number of those would be going up with the area of the circle defined by the view distance from the camera; pi*r^2. That could be said to be polynomial, but colloquially also "exponential".
@@mnxs I’m conflicted. On the one hand, I have had objections to people insisting that because people who study a particular topic use a certain word a certain way, that other people are also obligated to use the word that way rather than a different way.
But, at the same time, I strongly prefer that everyone did not refer to things that grow polynomially as “exponential”.
This is probably somewhat hypocritical of me…
I guess it is because I think math is so great and i want everyone to know it?
But like, people who study other fields probably feel the same way about their favored field.
What have you done to me ? Im obsessed with grass now.. Im supposed to be a gamer not some kind of.. real life human thing who goes outside or whatever that is.. Anyways great video lol, Im wondering about how you could fix the camera issue you were talking about using the original method tho
thank you chunks :)
Great content.
Question for your brave soul: have you yet traversed the depths of hell known as optimizing for standalone VR devices?
I have not and probably wont for a long time lol
why would u store each chunk individually, when u can use a single buffer multiple times? Make the grass repeat along its edges, and read the buffer with a scalar value, corresponding to the chunk position. It would be slightly slower, but you wouldn't have the Vram issue. you will also have to change the culling of grass in a chunk based manner, instead of doing it per grass model in each chunk. basically stopping at 2:12 instead of 2:13. It might actually give you peformance, as sorting buffers can be generally expensive in terms of performance.
Also what about reducing the grass range even further, and adding a granular post effect layer to the terrain instead? Far in the distance the grass looks like noise anyway, u just have to map the displacement wind noise to it in some way to make it look realistic. Also u would have to use the shader to lighting the visible terrain to sky edge.
Would love to hear why my suggestions wouldn't work. Destroy me :D
very cool grass
Great series! I learned a lot. Could you save space by rendering multiple blades per position? For example, one position could have 3-4 off shoots.
Also, not sure if this will make a difference, depending on hilly-ness your maps, the grass could skip rendering on the downslope side of the hill.
THANK YOU CHUNKS!!!!!🎉🎉🎉
Acerola: "Everyone has 16 GB of ram now a days"
Me: "4 GB is best I can offer"
So in conclusion, this is very doable, but you need to keep that final cost in mind and weigh whether or not it's applicable to the project you're working on. Dualy noted. I was considering using this for a VRC world where the only other performance heavy things are 1, people's avatars, and 2, a video player. I'll have to see how much of an impact it will have with other level geometry loaded in.
What if you gave the ground after the cutoff distance the same color as the actual grass, and applied the same noise on top of it to make it look darker as if there is grass moving there too?
Yeah you could do that instead of using fog to obscure the cutoff
Also maybe giving the ground a flat grass texture might help at high altitudes
In the far off distance, all the grass blurs together. Is it possible to instead have a single surface that replaces the grass at some distance and have the texture be some function of color calculated by how much the grass should be bending in that region (from the effect of either wind, something in the field, length, etc etc)?
you earned a sub
Thank u chunks
Thank you chunks!
Could you reduce the memory cost if you store only the Ids for referencing the blades from the first buffer inside the second buffer?
I NEED MORE GRASS VIDEOS
Id recommend swapping 3D grass for 2D billboards when the grass is a certain distance from the camera aswel
I was entertained, and you will never know for sure (because you cant see me) but I was razzle dazzled.
Can you actually look somewhere for how much memory is used exactly?
You kinda have to explicitly state the size of your memory buffer so you can just look at the argument you are passing in to determine that
thanks, chunks
Hi! I've been subscribed for a couple of days, and I'm loving your tutorials.
You give just the right amount of detail needed to make it both fun to watch, easy to understand and follow along, I think you'd be a great teacher :)
I'd like to ask something though, if you don't mind. What technologies/programs/ides do you use in your videos? You inspired me, and I want to start developing a game, just for fun and to see what happens! :D
Unity for the game engine. Autodesk Maya for the 3D modelling, but Blender is quite popular (and free).
I was looking into your code and I am wondering how you created the height map. Do you have a video on how to create a height map out of a mesh?
I had idea... You talked about lod, but what if at distance it would be one big mesh? Like big mesh? You could actually do that, even in order: 1. best grass 2. lod'd grass 3. one big mesh with color of grass.
It is very obvious from looking that at the distance all the grass looks like just one color... You could abuse that ig...
チャンクスさん、ありがとう。
I think I found the perfect grass
bring back the grass saga !!!
1:50 Those are definitely not 300m², a 300m * 300m area is 90,000m²
What if you used this chunking method and still, after a set distance, transitioned to the GPU instanced grass from the first video for your LOD grass? Would that help performance gains? Would there be a noticeable line at the cut off and, if so, could it be mitigated?
i could tell this guy really liked grass off the glasses alone
Unity docs say "don’t use GPU instancing for meshes that have fewer than 256 vertices." Maybe it's more performant to batch groups of blades as a single model right inside the compute shader and gpu instance those groups instead.
Is there a reason provided lol cause I cant think of any reason why they'd say that.
As someone who has worked on serious games/milsims grass is a fuckng nightmare because you have an expectation, especially in multiplayer environments, that if you are laying prone in grass, then people far away from you shouldn't be seeing grass de-render and you're soon to be corpse laying on a flat plane totally exposed.
Honestly still no good solutions to this day.
Regarding the memory efficiency, I believe UV and perhaps even displacement are unnecessary. UV should just be the same as Position.xz (unless Position is in camera space, in which case you could at worst introduce a camera pos uniform) and for the vertical displacement, it should be a solution to take like VertexID % 7 and use that to index the height in a constant array. (This is assuming that you have 7 vertices on a grass blade, you could insert a different number)
How feasible would it be to use a geometry shader to generate the positions of the grass as well based on what's visible in the frustum in the first place rather than sending the initial position buffer to the GPU. Like for each chunk you'd generate N positions for grass blades (pseudo-randomly but deterministic so it's the same every frame) using the height map displacement of the ground to get the base position, and from there generating the actual grass geometry. You'd cut about half the memory usage, but at the cost of more processing.
Also now I'm wondering if it would be possible to do this as a separate pass entirely using deferred rendering somehow, with a masking pass to mark where grass blades are supposed to go (probably wouldn't work because obviously some grass has to start off screen, though extended buffers might fix that? I don't know, I'm completely spitballing here)
Geometry shaders are extremely slow and not good for grass, sure the memory cost goes down but the fps will take a drastic hit.
@@Acerola_t - This is also keeping the grass in memory even when it's culled though, right? I wonder how effective it would be to cull each chunk on the CPU, and swap out non-visible chunks on the fly. It would mean swapping in 250k grass blades each time a chunk came into view, so I'm not sure how much of a performance hit that would be (and probably pretty bad when quickly moving the camera). I'm going to be doing something similar to this soon-ish, so I'll try some things.
You are a fucking genius, and me still trying to implement a grass shader in godot
Did you take any inspiration from the Sucker Punch GDC talk perhaps? They have pretty similar solutions for their grass rendering in Ghost of Tsushima.
No I think that came out after I made this
man I wish more games had better grass!
Okay now we have to unite and create ultimate grass experience that could be ran on any potato =)
nice video
Thank you!
What does the displacement from the ground do? Isnt a position enough ?
He's back.
Nice sunglasses.
we could also just draw an average colour gathered per chunk or bake a colour map to it.
Your grass has been hitting the algorithm and for some reason I keep clicking. Could you add a license to the github so we have explicit permission to reference/reuse the code in your repo and so we know exactly what you are okay with us using it for? I might make a more basic and optimized version of this for a game I am currently working on but you technically are not allowed to copy repo code and use it in production unless there is a license saying that you can.
Yeah sure I'll add one soon, feel free to use it!
@@Acerola_t Sweet! Thanks for all the great content, the longer I've been a game dev the more I've realized how much I still have to learn!