I worked in a studio at some point that did that trick you showed at the end. It worked for a while but we ran into issues with overdraw especially on lower-end hardware. All that air is being meshed and being drawn, thousands if not tens of thousands of layered transparent triangles that the GPU can't optimize. You can fix this with Raymarching in the fragment shader instead. So instead of 64*6 faces, you do 6 faces 1 giant cube for the chunk, then raymarch inside it usually outperforms the 64*6 faces.
@@uis246That only applies to the naive, greedy, and binary greedy meshing algorithms. You can’t punch a hole in a quad without adding more vertices, and that’s what the voxel lattice technique is trying to avoid.
@@IsmeGenius I can see it working for small-scale voxel rendering, but for a large world? This is quite literally the worse case of overdraw I have ever seen, bar none. Not even shell texturing comes close to this level of absurd.
@@ravenmillieweikel3847 voxel lattice doesn't create big quads in the first place. As an example you can play minecraft and in spectator mode move inside wall of blocks and you will see, or rather not see all the quads that are occluded by other voxels.
The 3D/index conversion can be simplified to be more intuitive if you just see it as packing 3 different numbers into the different bits of a number. x + 64 z + 64² y is the same as (x) | (z
sounds like using an octtree where the root is the entirety of the visible world and last layer of nodes represents the chunks, and if a branch ends early then that means its an area filled with air. Edit: I just realized that you can further subdivide a chunk to be 1/8 or 1/(8^2), etc. and if any of those volumes is empty then you can skip it as well.
Brilliant, I loved it. You ended up on the best solution possible in my book. And I do mean it. Always when possible the best solution is to not have a problem, so if you have a problem you just stop and think: "but what if it's not a problem". And then you don't have a problem anymore. it's perfect.!
Damn that was great! It really reminded me of Alan Kay's talk "Power of Simplicity" which also covered this idea of trying to think outside of normal conventions in order to solve difficult problems in new ways and how that tends to be easier once you find a better way to frame them. The only part I didn't really understand was that "frustum rotation" thing for the field of view.
@@fabien7123 Maybe something along the lines of: Compute look direction, multiply it by half max-distance and then show only chunks around that position with radius that covers the FOV.
It's easy, instead of calculating radius of rendered grid at the player position, they render the center of the grid in front of where the player is looking, to maximize how much of the grid is rendered. Their wording is funky though 😅
@@timmygilbert4102 You are the first person to get it from my poor explanation! Yes this is the case but I never re-explained it because I ended up just not including it in future projects, though I did have this working. I think the gains are minimal at certain angles where fitting the grid to the frustum essentially places the center of the grid where the camera is anyways. What I would do instead is have N budget of chunk-lattices and then re-position them to "capture" the maximal visible chunks in a frame, or just not move the player but maybe rotate their camera looking into what would could be described as the global lattice cut in half. Every 180 degrees the player could set their look at dir to the beginning of this "looping" capture grid. I had this working at some point but decided it wasn't that worth doing. Maybe that made sense.
I implemented a renderer that uses a global lattice, but the overdrawn was too much of a problem, i didn't know how to improve its performance, then i implemented an octree raymarching version and it outperformed the global lattice version by a lot.
I think the main problem is hardware for this sort of solution, GPUS are very optimized for triangles already and trying to go against that would be needlessly inefficient, at least until we have better inbuilt voxel solutions on the GPU. sort of like how the new raytracing GPUS now have that built in.
Wow, this kind of blew my mind. I got lost a little bit when you were talking about the type buffer. My understanding of what you were saying is that the type buffer is a way for the GPU to take a voxel location in the world and look up the voxel type at that position (e.g. "sand" or "rock" or whatever). Then that type ID can be used to calculate the UVs in the texture atlas for that voxel face. And all that could be done in the vertex shader. Is that correct? (EDIT: No - it would have to happen in the fragment shader, right? Because there are no separate vertices per individual voxel face.)
Getting X, Y and Z coordinates from a one-dimensional index when the size of the chunk is a power of 2 is much faster with bitwise operations. Instead of the modulo operation (index%64) do the AND (index&63), and instead of integer division (index/64) do shifts (index>>6). Division (and modulo, since you still have to divide to find the modulo) are slow operations, and bitwise operations are much faster. But I'm sure it's still fine to write it the way shown in the video, since the compiler is probably smart enough to realize what you are trying to do and will optimize it anyways into bitwise operations.
@@abcxyz5806 Well, I don't think that working with negative array indexes is a good thing in the first place, but if you want, you can. And I don't see how it's helpful in any way here... but technically it works...
Heh, wrote my thesis about the same solution with similar progression. Small world :D I even found my old forum post where I got it running in blender: "I think I call it Pixel Meshing, or how to 2miljon voxels 60fps. [new v3 update]"
Hello @VegetableJuiceFTW, I have created a prototype of this type of renderer, however I noticed a difference between the one described here and the one described in your forum post. It seems Morley here is suggesting to have planes slice up the entire world (8192 * 4 + 255 * 2), while yours uses chunks. I have tried slicing the whole world, but found it degrades performance by a lot, and in one of your comments you suggested it is a pipe bandwidth issue. I was curious if you could help explain to me what is going on. I'm trying to learn more in depth about graphics compute. Thank you very much!
I'm so glad I finally stumbled over this video. This really inspired me on my quest to find a subject for my bachelor's thesis. I've known for some time now that I wanted to do something in the area of voxel-based procedural generation, and was wondering if a few of the concepts you mentioned could be applied to marching cubes or other isosurface extraction algorithms. Although with such great performance and the ability to render so many voxels you wouldn't need mc anymore. How would you handle collisions if the mesh of the chunk includes the air? Am I wrong in assuming that you would need a pass of your meshing algorithm that separates air and terrain(like you show) to extract a collision mesh where you need it? Or could you maybe handle collision entirely on the gpu in some fancier way than a collision mesh? Anyways, for your first talk you did very very well! I love your approach to greedy meshing and your explanatin was really nice to follow. Very good presentation - even years later.
The mesh and underlying block data don't need to have anything to do with each other. You can collision test against nearby blocks with the safe assumption that they will always be lots of individual axis aligned bounding boxes.
What GPU hardware are you targeting? I would think that overdraw becomes a problem, especially in the case where you are looking at only very distant terrain and the GPU has to draw and discard 4000 full screens of pixels.
I ran a Nsight GPU Trace and the percentage of frame time spent discarding on a world that expands infinitely only in two directions, while looking directly down the horizon, was around 2%. I would bet this would run on 900 series just fine. If this ends up becoming a problem though the only thing you have to change is instead of having one global mesh for the entire world you break it up into a mesh-per-chunk. So if your chunks are 64^3 or so, you would have a lattice mesh *specifically* for a single chunk, then offset it and draw using instancing maybe, or even just copy the whole mesh for every chunk, and it wouldn't break a sweat and would still be many orders of magnitude less triangles than what a traditional meshing pipeline would produce.
I'm a bit confused about one point. You used an example of an 8192x8192x255 sized world. That's 17,112,760,320 voxels, if each of those voxel positions was associated with a material, and let's say we allow 1 byte for each material id that's 17 GB of material data that you're saying to store on the GPU. That takes up the 71-72% of the VRAM on a 4090. Obviously this data could be compressed but still even such that's a pretty astronomical amount of data. Are you doing anything in particular to optimize that, Octrees, DAGs, RLE, anything? Because if so that wasn't presented and it'd be interesting to hear about it. Preliminary tests with this method on smaller worlds the performance seems good in crowded areas with close by geometry due to distance checks, but on very wide open areas the performance of this seems to drop significantly, to the point there I get much better performance just sticking with static planes for each chunk over the entire world as I can skip rendering entirely of completely empty chunks and thus have far, far reduced overdraw in very open scenes (like pains with lots of open air). Number of triangles doesn't necessarily carry over into better performance.
I agree the triangle count is not always the cost here, its more that it makes the scene very easy to represent. When it comes to data there is necessarily in any game ever a point where you add a constraint such as "I am not going to have every single voxel set in the entire world". For instance an occlusion culling method is necessarily an optimization for a general case and is expecting the player to not be able to see every single piece of geometry at once in the world, this is technically a design constraint and is something that can be considered during development. The naive solution actually gets you very far, as in, expecting that the entire world will not be set at once. Obviously this will change directly with however you are doing terrain generation but you can optimize around this. As for overdraw there are many obvious solutions. My point with this talk was to provide a good general solution for a common archetype of game being the block game. The method works especially well with a world that is not cubic, so where there are more fragment tests in this world (the horizontal axis, 8192 x 8192) these are also the axis more likely to be populated with voxels, cutting down the overdraw. A really easy method to expand into is what you're mentioning, have a lattice per-chunk and do it that way, it would drastically decrease overdraw definitely. Another thing to try is to order faces and the draw calls of said faces based on a line algorithm every frame. E.g. in a compute shader, for every chunk based on its position relative to a camera, get a perspective correct ray and basically run DDA on it, getting the optimal ordering of faces using the reciprocal of the camera direction and then order your draw calls as such. This way you could analytically draw perfectly ordered faces, this also would be great for transparency sorting!
As neat as the lattice method is, it doesn't seem practical. If you're targeting newer desktop GPUs, octrees and ray tracing are the way to go. If you need to use rasterization either for consoles, mobile, or older GPUs, vertex compression, octrees, LODs, and compute-based occlusion culling is the way to go. For examples of the latter, Ethan Gore has excellent RUclips demo's of a voxel engine using traditional rasterization with OpenGL and achieving mind-bogglingly large voxel worlds many, many times larger than this at
This is so smart! I've been trying to wrap my head around the best way of achieving greedy meshing whilst maintaining textures, lighting, ambient occlusion etc. From an optimisation standpoint, do you fill the texture data for a slice of the global lattice in the fragment shader, or do it elsewhere and pass it in?
Given that the chunk data is already on the GPU, it makes sense to do the lookup from the fragment shader. If you store IDs, and then have a separate buffer mapping indices to properties, you can lookup all the data you need pretty quickly that way. You can even add textures if you need them. The downside of this approach is that it only works for pure voxels.
@@LeBopperoni There's a separate video on his channel where he leads a talk describing how he builds the geometry and says he stores run-encoded IDs on the GPU. Nvidia's paper also does something similar in their efficient raytraced SVO paper, though they use a single buffer for everything which is better for data locality but makes it extremely difficult to modify the terrain so if your voxels don't need to change at runtime it might be best to keep them in buffer you populate once. You also don't need to store texture index property if you don't have multiple voxels with same textures and then just have a texture array where index == ID (0 being air/transparent).
@@Caellyan yeah I plan to just concatenate every world chunks data and handle it as a single buffer. Can just store an index per chunk for updating since I recycle through a pool of them anyway.
Where's the best place to follow the development of this engine? John Lin's YT has some videos posted but it's been years since the last. Even a pulse check every now 'n then would be nice.
In GPU pipelines one usually distinguishes between the vertex shader, which decides which vertices to render, and the fragment shader, which render pixels (rasterizes) throughout each triangle. I assume you would simply use the type buffer to decide which colors to render in the fragment shader.
@@sullivan3503 The vertex shader can decide what vertices to render, kind of, but it's more so about *how* it renders a set of vertices than deciding between amounts of vertices render. At least this is how I have seen them used.
In a fragment shader you have access to the UV (texture) coordinates of the pixel as well as the screen space and world position. I think in this case the UV would be sufficient, perhaps with a uniform indicating global chunk offset. You would choose the texture from an atlas based on floor(UV * 64) and pass the remainder to the texture sampler.
I'm also implementing a voxel engine, but I can't quite follow the part where you use a single pair of triangles for the entire side of a chunk. This might work in 2D, but in 3D, you're only taking into account the edge voxels. How would the inner voxels be rendered? If there are no inner triangles being rasterized, how are those voxels being drawn?
I'm not quite sure what you mean. If you have a chunk that is 32^3, you have 32 faces per axis, * 2 for both directions, positive and negative. This ends up being 32*6 for an entire chunk, but the beauty of the Global Lattice is that you can use faces for the entire world, so if your world is 8192 ^ 2 * 255, that becomes 8192 * 4 + 255 * 2 faces.
@@davis-morley Oh, I initially thought that you were drawing the external faces of a chunk and have the fragment shader do a Bresenham line walk into the chunks binary data to find a hit. I would have never thought that drawing all slices in a chunk would have been performant. Nice that it works!
I also didn't get that you draw each slice on top of each other, from video I was thinking you put mesh on chunk borders only and somehow draw entire chunk with it
would've liked to see the type buffer - isn't the whole map stored as 3D-VU-maps in vram this way? where are the limits here for various graphic cards?
People always say this but in my mind you would just have your own physics solver or only mesh the relevant chunks (with binary greedy meshing) whenever you need to. The cost of binary greedy meshing is so low that this is feasible if required.
How do you deal with internal faces of the individual voxels that are adjacent to air voxels? These small 1x1 faces that are supposed to be sandwiched between the wide 64x64 faces of a slice will be missing from the mesh, how do you draw them?
I am confused on how the global lattice works. If you have a mountain in 3D space and only render the 6 faces of a chunk, then you get closer to the mountain wouldn't you clip into the chunk and then see nothing? I feel like I missed something. If I did please let me know.
You aren't rendering just the boundaries of the chunk, that would just be 6 faces total. Instead, you are rendering every possible slice of the chunk along all 3 axes, hence 6*64 faces.
When removing color information from the mesh when turning it onto binary, how is this information re-added for the actual rendering? I imagine it being not that intuitive to extract a texture from the voxel file
What if some of your voxels can have different models, like a cube for the default, but also campfires or whatever, and different materials, like animated lava etc, would the optimizations still work?
The optimizations would still work for the voxels, but not for the new general meshed objects since they cannot be assumed to easily align to the simple 3D axis. You could still add (relatively) smaller amounts of meshes into the scene though, since the voxels are still technically rendered using meshes with depths etc.
Very minimal! There are a few ways to build on this algorithm that would have perfect draw-order. There is a way to order the draw calls per frame that will match their distance from the camera just using a line algorithm like DDA, I've not worked on this idea since this talk though.
Sometimes, the only reason something is technically impossible is because all the technical experts in the field are so insistant that it just _cannot_ be done. It's impossible, so don't waste your time and (more importantly to them) don't waste *their* money attempting it. There are *so many* far better, more effective, more efficient ways to do _just about everything;_ not just technical stuff like this, but *everything.* From economics to education to even more abstract things like philosophy and self-awareness, it's possible to do better than you're currently doing; all you have to do is stop insisting it's impossible, and just _try._ And not just a token display where you design the attempt to fail, just to prove it's impossible; but a real, genuine effort to test it. Sure, it's great to learn from the mistakes of others; you don't need to re-invent the wheel and you don't need to do it wrong yourself if you've already witnessed someone else make the same mistake. But *did* _you_ witness the mistake? Technology was re-invented at times when it was either lost, or in separate parts of the world before its prior invention from one part had spread to another. Don't re-invent the wheel, invent a *better* wheel; you can patent it if it's different and innovative enough.
@@1234macro It's to replicate the major and minor verbal emphasis I'd put on words or phrases if I were speaking. It helps better convey the tone and avoid the flattening that happens in long posts, since I can't really pick up on tone differences intuitively. In other words, it's how I adjust for my ASD so my writing doesn't come across with incorrect emotional projection; I've had that issue before, where I type something in an explanatory or educational tone, but without my normal speaking emphasis, they read it as a rant or an angry, accusatory post. So just let it flow naturally as speech with bold as emphasized words, and italics as half-emphasized words.
@@omargoodman2999 Tone indicators should be used sparsely, and is a matter of taste. Your usage comes across as preachy and corny. You should seek to convey your tone through improved writing as opposed to enriching the glyph itself. Tone indicators _ought to_ be used *sparsely,* and is a matter of taste. _Your_ usage comes across as *preachy* and *corny.* You _should_ seek to convey your tone through improved _writing_ as opposed to enriching the glyph itself. I know which version _I_ prefer.
@@1234macro Actually, the second version is a lot easier for me to read. I read it, but my brain automatically processes it like "voice". It sounds more natural and explanatory. The first one is just flat text.
Although it is usually implemented that way in 3D engines, collision detection doesn't need triangles. In this case, sampling the voxel array directly will work just as well.
@@stoician whoa, I wish to implement this in Godot, but I can't really imagine from the description. It would be cool, but how would anything know how to collide and what normal to bounce around without triangles? How do we ensure collisions match the visual mesh?
I do a "greedy meshing" algorithm that finds the largest possible rectangular prisms to represent the solid voxels. Start with the first unmeshed voxel, expand in X direction, then Y direction, then Z direction as far as you can. Then do the same for each unmeshed voxel. Take all those rectangular prisms together as collision shapes for your kinematic body, and you're done!
@@anastasiaklyuch2746 I was thinking in a full custom sort of way, but if you are using Godot, there are things you could leverage to skip the details. Keep in mind, I've never used Godot, so this is only after a quick look at the documentation, but it looks like a StaticBody3D containing BoxShape3D for each voxel would do it. Knowing that the fragment shader is going to be using the fragments location in 3D space rounded to the nearest whole number to index into the voxel array, you would iterate over the voxel array on the CPU and when a solid voxel is found use the index to generate the location coordinates to place a BoxShape3D there. But that will be a lot of BoxShape3Ds, which might be too much for the physics engine. So a more optimized way would be to use the coordinates of the movable object/player character to generate an index and loop over the nearby voxels to place BoxShape3Ds in only those locations. Then as the object moves, also move the BoxShape3Ds that get too far away to the voxels that get closer. That should be extremely fast since it will only require a handful of Shape3Ds. To go more towards what I was thinking in my original reply, there might be ways to improve the efficiency further by subclassing Shape3D to make a VoxelChunkShape3D, but that would require deep knowledge of how Shape3D interacts with the physics engine. Perhaps looking at how SphereShape3D and BoxShape3D are implemented will provide some hints.
Suggestion: Make the video audio a bit louder. I had to watch this on 145% volume, and even then, I am lucky! Windows and MacOS users cap at 100%, and even then, most Linux users also cap at 100%. I am in a special situation, because I am using a window manager.
Oh yea, there are so many tutorials for naive meshing. Literally just go over any minecraft tutorial on youtube, they never bother going past the naive meshing. Google b3agz minecraft series, follow it, and you'll know how to do naive meshing.
That multidimensional array to simple array is so incredibly overexplained holy shit, this is such basic stuff. And a 3D array in any language you'd actually consider writing something efficient in would result in continuous memory anyways, so there is literally only syntactic difference between his implementation and what the compiler would produce. This is BAD.
Cool so you chose to focus only on the small kinda irrelevant part and disregard the actual meat of the talk, good job :) It did start slow tho, I'll give you that.
It's true that the coordinates to single index conversion probably didn't need to be explained taht deepely. It's trivial stuff that any programer end up learning soon enough. However, it's false to think that the language will actually do this for you. While it's mostly true for any statically allocated buffers, this is no longer the case when it comes to multi dimentional heap allocated buffers where each sub-buffers (along the first coordinate) are no longer guaranteed to be contiguous in memory. Considering the actual size of the array here being 64^3, the total memory size can vary from 0.26 MB to 2.09 MB per chunk (considering you only store simple integral type into it). While it may work for a single chunk, it is totally unthinkable to render an entire world with statically allocated arrays of that size.
@@Ethanthegrand Z is the up axis in pretty much all of mathematics, I wouldn't know about games. My comment was not serious either, I really enjoyed the video.
@@seftondepledge3658 ahh I see. Growing up coding, when working with 2D games, X is left/right, Y is up/down, which makes sense, and introducing Z adds the forward/back axis
To be exact, any axis could be up without any noticeable consequences as long as the positive directions are maintained relative to each others following the right hand rule (which basically means that the cross product of 2 axis gives the third one). So if +X is right, and +Y is up, then +Z must be back. I tend to prefer having Z upward because it means that X can stay pointed toward the right and Y becomes positive towards the front, which makes more sens to me, and I think this is also the reason why mathematics sticks to that norm as well. However, just like I was saying at the begining, it barely change anything in the end as the direction ultimately ends up to be relative to the camera orientation.
@@arthur1112132 Oh, it makes absolutely no difference to the maths I agree. But my brain prefers it that way. I think it also makes sense (in maths at least) as you are used to 2d graphs being on paper on a desk etc where the x-y plane is horizontal. So z being the vertical axis is just an extension of this. Not a reordering.
"55 FPS is slow" Really? I don't even find 30 to be slow. If I saw a video that's 30 FPS and a video that's 60 FPS, I doubt I'd even be able to tell them apart.
Yep, there's a big difference between 30, 60 and 144. It's less noticeable on video because you're not inputting anything. higher fps noticeably improves the responsiveness and smoothness of games.
55 FPS is definitely slow. in the context of real-time rendering for games, that is. and yes, the difference between 30 and 60 is definitely noticeable. most people can tell them apart. what is the refresh rate of your monitor?
It is slow in the context of real time rendering. First of all, we tend to easily forget that FPS are actually a verry bad benchmarking unit as it is non linear. Loosing 5 FPS from 60 to 55 is not the same as loosing 5 FPS from 200 to 195: the former means a 1.51 milliseconds loss while the later only represents a 0.12 milliseconds difference. Now, even if it is highly dependent on the exact hardware you have, most of them are able to reach over 60K FPS when rendering less than a few tousands of triangles, which means a minimum of roughly 0.016 milliseconds or 16 microseconds per frame. When working with realtime rendering, we often consider our program as having a "frame time budget" that we are gonna spend across all of our different features. 55 FPS means 18.18 milliseconds per frame, which is probably finewhen talking about a completely finished game (even tho some would argue with that). However, when talking about a prototype like in the video where the only thing done is meshing and rendering a single 64^3 chunk, 18 milliseconds IS slow because it basically means that all of your time budget has been spent on that chunk only and that you won't be able to spend much more for anything else without negatively impacting the overall performances of your project.
I worked in a studio at some point that did that trick you showed at the end. It worked for a while but we ran into issues with overdraw especially on lower-end hardware. All that air is being meshed and being drawn, thousands if not tens of thousands of layered transparent triangles that the GPU can't optimize. You can fix this with Raymarching in the fragment shader instead. So instead of 64*6 faces, you do 6 faces 1 giant cube for the chunk, then raymarch inside it usually outperforms the 64*6 faces.
The only issue I can see with this voxel lattice system is OBSCENELY MASSIVE amounts of overdraw.
9:19
@@uis246That only applies to the naive, greedy, and binary greedy meshing algorithms. You can’t punch a hole in a quad without adding more vertices, and that’s what the voxel lattice technique is trying to avoid.
Yeah, very suspicious that author does not even touch on this. And his replies elsewhere in the comment section he is very vague about it.
@@IsmeGenius I can see it working for small-scale voxel rendering, but for a large world? This is quite literally the worse case of overdraw I have ever seen, bar none. Not even shell texturing comes close to this level of absurd.
@@ravenmillieweikel3847 voxel lattice doesn't create big quads in the first place. As an example you can play minecraft and in spectator mode move inside wall of blocks and you will see, or rather not see all the quads that are occluded by other voxels.
I've done a similar approach to greedy meshing for compressing voxel models.
the binary greedy meshing is really clever, never occurred to me
The 3D/index conversion can be simplified to be more intuitive if you just see it as packing 3 different numbers into the different bits of a number. x + 64 z + 64² y is the same as (x) | (z
sounds like using an octtree where the root is the entirety of the visible world and last layer of nodes represents the chunks, and if a branch ends early then that means its an area filled with air.
Edit: I just realized that you can further subdivide a chunk to be 1/8 or 1/(8^2), etc. and if any of those volumes is empty then you can skip it as well.
Did a similar thing to the greedy meshing for batching box colliders in a 2d grid based game
Same
Hey if this was your first public speaking event, you killed it!
Thanks! Appreciate it.
Brilliant, I loved it. You ended up on the best solution possible in my book. And I do mean it.
Always when possible the best solution is to not have a problem,
so if you have a problem you just stop and think: "but what if it's not a problem".
And then you don't have a problem anymore.
it's perfect.!
Damn that was great!
It really reminded me of Alan Kay's talk "Power of Simplicity" which also covered this idea of trying to think outside of normal conventions in order to solve difficult problems in new ways and how that tends to be easier once you find a better way to frame them.
The only part I didn't really understand was that "frustum rotation" thing for the field of view.
Right, the frustum bit made no sense
@@fabien7123 Maybe something along the lines of:
Compute look direction, multiply it by half max-distance and then show only chunks around that position with radius that covers the FOV.
It's easy, instead of calculating radius of rendered grid at the player position, they render the center of the grid in front of where the player is looking, to maximize how much of the grid is rendered. Their wording is funky though 😅
@@timmygilbert4102 Oooh ok I get it now! Yea that was pretty simple, thanks! :D
@@timmygilbert4102 You are the first person to get it from my poor explanation! Yes this is the case but I never re-explained it because I ended up just not including it in future projects, though I did have this working. I think the gains are minimal at certain angles where fitting the grid to the frustum essentially places the center of the grid where the camera is anyways. What I would do instead is have N budget of chunk-lattices and then re-position them to "capture" the maximal visible chunks in a frame, or just not move the player but maybe rotate their camera looking into what would could be described as the global lattice cut in half. Every 180 degrees the player could set their look at dir to the beginning of this "looping" capture grid. I had this working at some point but decided it wasn't that worth doing. Maybe that made sense.
I implemented a renderer that uses a global lattice, but the overdrawn was too much of a problem, i didn't know how to improve its performance, then i implemented an octree raymarching version and it outperformed the global lattice version by a lot.
using a non-triangle vertex really popped out at me as having potential.
I think the main problem is hardware for this sort of solution, GPUS are very optimized for triangles already and trying to go against that would be needlessly inefficient, at least until we have better inbuilt voxel solutions on the GPU. sort of like how the new raytracing GPUS now have that built in.
Wow, this kind of blew my mind. I got lost a little bit when you were talking about the type buffer. My understanding of what you were saying is that the type buffer is a way for the GPU to take a voxel location in the world and look up the voxel type at that position (e.g. "sand" or "rock" or whatever). Then that type ID can be used to calculate the UVs in the texture atlas for that voxel face. And all that could be done in the vertex shader. Is that correct? (EDIT: No - it would have to happen in the fragment shader, right? Because there are no separate vertices per individual voxel face.)
Getting X, Y and Z coordinates from a one-dimensional index when the size of the chunk is a power of 2 is much faster with bitwise operations. Instead of the modulo operation (index%64) do the AND (index&63), and instead of integer division (index/64) do shifts (index>>6). Division (and modulo, since you still have to divide to find the modulo) are slow operations, and bitwise operations are much faster.
But I'm sure it's still fine to write it the way shown in the video, since the compiler is probably smart enough to realize what you are trying to do and will optimize it anyways into bitwise operations.
Yeah, I've tested in the past with clang and it didn't change anything, probably still worth it for debug builds though!
Also bitwise and does the correct thing for negative numbers, where modulo will return a negative number!
@@abcxyz5806 Well, I don't think that working with negative array indexes is a good thing in the first place, but if you want, you can. And I don't see how it's helpful in any way here... but technically it works...
4:53 *Correction:* X is always in the range 0..63 (inclusive).
If it reaches 64 it is in the next chunk at index 0.
Heh, wrote my thesis about the same solution with similar progression. Small world :D
I even found my old forum post where I got it running in blender: "I think I call it Pixel Meshing, or how to 2miljon voxels 60fps. [new v3 update]"
Hello @VegetableJuiceFTW, I have created a prototype of this type of renderer, however I noticed a difference between the one described here and the one described in your forum post.
It seems Morley here is suggesting to have planes slice up the entire world (8192 * 4 + 255 * 2), while yours uses chunks.
I have tried slicing the whole world, but found it degrades performance by a lot, and in one of your comments you suggested it is a pipe bandwidth issue.
I was curious if you could help explain to me what is going on. I'm trying to learn more in depth about graphics compute.
Thank you very much!
I'm so glad I finally stumbled over this video.
This really inspired me on my quest to find a subject for my bachelor's thesis. I've known for some time now that I wanted to do something in the area of voxel-based procedural generation, and was wondering if a few of the concepts you mentioned could be applied to marching cubes or other isosurface extraction algorithms. Although with such great performance and the ability to render so many voxels you wouldn't need mc anymore.
How would you handle collisions if the mesh of the chunk includes the air? Am I wrong in assuming that you would need a pass of your meshing algorithm that separates air and terrain(like you show) to extract a collision mesh where you need it? Or could you maybe handle collision entirely on the gpu in some fancier way than a collision mesh?
Anyways, for your first talk you did very very well! I love your approach to greedy meshing and your explanatin was really nice to follow. Very good presentation - even years later.
The mesh and underlying block data don't need to have anything to do with each other.
You can collision test against nearby blocks with the safe assumption that they will always be lots of individual axis aligned bounding boxes.
What GPU hardware are you targeting? I would think that overdraw becomes a problem, especially in the case where you are looking at only very distant terrain and the GPU has to draw and discard 4000 full screens of pixels.
I ran a Nsight GPU Trace and the percentage of frame time spent discarding on a world that expands infinitely only in two directions, while looking directly down the horizon, was around 2%. I would bet this would run on 900 series just fine. If this ends up becoming a problem though the only thing you have to change is instead of having one global mesh for the entire world you break it up into a mesh-per-chunk. So if your chunks are 64^3 or so, you would have a lattice mesh *specifically* for a single chunk, then offset it and draw using instancing maybe, or even just copy the whole mesh for every chunk, and it wouldn't break a sweat and would still be many orders of magnitude less triangles than what a traditional meshing pipeline would produce.
I'm a bit confused about one point. You used an example of an 8192x8192x255 sized world. That's 17,112,760,320 voxels, if each of those voxel positions was associated with a material, and let's say we allow 1 byte for each material id that's 17 GB of material data that you're saying to store on the GPU. That takes up the 71-72% of the VRAM on a 4090. Obviously this data could be compressed but still even such that's a pretty astronomical amount of data. Are you doing anything in particular to optimize that, Octrees, DAGs, RLE, anything? Because if so that wasn't presented and it'd be interesting to hear about it. Preliminary tests with this method on smaller worlds the performance seems good in crowded areas with close by geometry due to distance checks, but on very wide open areas the performance of this seems to drop significantly, to the point there I get much better performance just sticking with static planes for each chunk over the entire world as I can skip rendering entirely of completely empty chunks and thus have far, far reduced overdraw in very open scenes (like pains with lots of open air). Number of triangles doesn't necessarily carry over into better performance.
I agree the triangle count is not always the cost here, its more that it makes the scene very easy to represent. When it comes to data there is necessarily in any game ever a point where you add a constraint such as "I am not going to have every single voxel set in the entire world". For instance an occlusion culling method is necessarily an optimization for a general case and is expecting the player to not be able to see every single piece of geometry at once in the world, this is technically a design constraint and is something that can be considered during development. The naive solution actually gets you very far, as in, expecting that the entire world will not be set at once. Obviously this will change directly with however you are doing terrain generation but you can optimize around this. As for overdraw there are many obvious solutions. My point with this talk was to provide a good general solution for a common archetype of game being the block game. The method works especially well with a world that is not cubic, so where there are more fragment tests in this world (the horizontal axis, 8192 x 8192) these are also the axis more likely to be populated with voxels, cutting down the overdraw.
A really easy method to expand into is what you're mentioning, have a lattice per-chunk and do it that way, it would drastically decrease overdraw definitely. Another thing to try is to order faces and the draw calls of said faces based on a line algorithm every frame. E.g. in a compute shader, for every chunk based on its position relative to a camera, get a perspective correct ray and basically run DDA on it, getting the optimal ordering of faces using the reciprocal of the camera direction and then order your draw calls as such. This way you could analytically draw perfectly ordered faces, this also would be great for transparency sorting!
how do you render different color/materiality of block types with these optimizations?
yeah, I'm barely keeping up with you here but thats pretty slick. I'd love to see some places thats been put into practice
Very cool talk. I hope there are going to be open source implementations
You could say that this presentation was Really Powerful
That was an amazing talk
As neat as the lattice method is, it doesn't seem practical. If you're targeting newer desktop GPUs, octrees and ray tracing are the way to go. If you need to use rasterization either for consoles, mobile, or older GPUs, vertex compression, octrees, LODs, and compute-based occlusion culling is the way to go. For examples of the latter, Ethan Gore has excellent RUclips demo's of a voxel engine using traditional rasterization with OpenGL and achieving mind-bogglingly large voxel worlds many, many times larger than this at
This is so smart!
I've been trying to wrap my head around the best way of achieving greedy meshing whilst maintaining textures, lighting, ambient occlusion etc.
From an optimisation standpoint, do you fill the texture data for a slice of the global lattice in the fragment shader, or do it elsewhere and pass it in?
Given that the chunk data is already on the GPU, it makes sense to do the lookup from the fragment shader.
If you store IDs, and then have a separate buffer mapping indices to properties, you can lookup all the data you need pretty quickly that way. You can even add textures if you need them.
The downside of this approach is that it only works for pure voxels.
@@Caellyan Thanks for the clarification!
I'd hoped this was the case as it made the most sense to me.
I'll be having a good run at this in January.
@@LeBopperoni There's a separate video on his channel where he leads a talk describing how he builds the geometry and says he stores run-encoded IDs on the GPU.
Nvidia's paper also does something similar in their efficient raytraced SVO paper, though they use a single buffer for everything which is better for data locality but makes it extremely difficult to modify the terrain so if your voxels don't need to change at runtime it might be best to keep them in buffer you populate once. You also don't need to store texture index property if you don't have multiple voxels with same textures and then just have a texture array where index == ID (0 being air/transparent).
@@Caellyan yeah I plan to just concatenate every world chunks data and handle it as a single buffer. Can just store an index per chunk for updating since I recycle through a pool of them anyway.
Where's the best place to follow the development of this engine? John Lin's YT has some videos posted but it's been years since the last. Even a pulse check every now 'n then would be nice.
I love what the captions say at 13:10
Delete and modulo may be replaced by shift and mask for getting coord from index.
have played several times with greedy meshing, but still didn't found a good way to get rid of t junction artifacts
That's neat, but how do you texture a triangle that encompasses different types? Do you check the type per fragment?
In GPU pipelines one usually distinguishes between the vertex shader, which decides which vertices to render, and the fragment shader, which render pixels (rasterizes) throughout each triangle. I assume you would simply use the type buffer to decide which colors to render in the fragment shader.
@@sullivan3503 The vertex shader can decide what vertices to render, kind of, but it's more so about *how* it renders a set of vertices than deciding between amounts of vertices render. At least this is how I have seen them used.
In a fragment shader you have access to the UV (texture) coordinates of the pixel as well as the screen space and world position. I think in this case the UV would be sufficient, perhaps with a uniform indicating global chunk offset. You would choose the texture from an atlas based on floor(UV * 64) and pass the remainder to the texture sampler.
I'm also implementing a voxel engine, but I can't quite follow the part where you use a single pair of triangles for the entire side of a chunk.
This might work in 2D, but in 3D, you're only taking into account the edge voxels. How would the inner voxels be rendered? If there are no inner triangles being rasterized, how are those voxels being drawn?
I'm not quite sure what you mean. If you have a chunk that is 32^3, you have 32 faces per axis, * 2 for both directions, positive and negative. This ends up being 32*6 for an entire chunk, but the beauty of the Global Lattice is that you can use faces for the entire world, so if your world is 8192 ^ 2 * 255, that becomes 8192 * 4 + 255 * 2 faces.
Oh, I understand now. You're applying these to slices along all axes within the chunk. That's pretty simple! Nice!
@@davis-morley Oh, I initially thought that you were drawing the external faces of a chunk and have the fragment shader do a Bresenham line walk into the chunks binary data to find a hit.
I would have never thought that drawing all slices in a chunk would have been performant. Nice that it works!
I also didn't get that you draw each slice on top of each other, from video I was thinking you put mesh on chunk borders only and somehow draw entire chunk with it
amazing talk
i think voxels take up more space the vector
Its a shame about the example clip they used at the start, that game project went dead silent after the first two tech teasers years ago
would've liked to see the type buffer - isn't the whole map stored as 3D-VU-maps in vram this way? where are the limits here for various graphic cards?
I'm honored.
Thats awesome! Wont you still need to do meshing for the physics colliders?
People always say this but in my mind you would just have your own physics solver or only mesh the relevant chunks (with binary greedy meshing) whenever you need to. The cost of binary greedy meshing is so low that this is feasible if required.
Wouldn't this use a ton of memory? Especially with high resolution textures?
How do you deal with internal faces of the individual voxels that are adjacent to air voxels? These small 1x1 faces that are supposed to be sandwiched between the wide 64x64 faces of a slice will be missing from the mesh, how do you draw them?
I'm not sure I understand, though this is not a problem. Which algorithm are you referring to, greedy meshing or the Global Lattice?
That microphone is so sensitive I'm gonna lose my mind, you can hear every breath and swallow.
I am confused on how the global lattice works. If you have a mountain in 3D space and only render the 6 faces of a chunk, then you get closer to the mountain wouldn't you clip into the chunk and then see nothing? I feel like I missed something. If I did please let me know.
You aren't rendering just the boundaries of the chunk, that would just be 6 faces total. Instead, you are rendering every possible slice of the chunk along all 3 axes, hence 6*64 faces.
@@konignickerchen7265 Oh OK.
Hi, are there any updates on Voxely?
When removing color information from the mesh when turning it onto binary, how is this information re-added for the actual rendering? I imagine it being not that intuitive to extract a texture from the voxel file
out of curiosity, would there be overdraw problems with that many meshes drawn on top of each other?
You could draw meshes front-to-back
@@ukyoize It will still need to compute at least depths, up to 8k or more times for each pixel, excessive to say the least.
What is the Global Lattice's *2 in 255*2 for? Backface culling?
found it at 25:13
thanks, had the same question!
Does the global lattice use more gpu ressources due to being partially transparent?
What if some of your voxels can have different models, like a cube for the default, but also campfires or whatever, and different materials, like animated lava etc, would the optimizations still work?
The optimizations would still work for the voxels, but not for the new general meshed objects since they cannot be assumed to easily align to the simple 3D axis. You could still add (relatively) smaller amounts of meshes into the scene though, since the voxels are still technically rendered using meshes with depths etc.
where can i try this/make a game with this?
What impact of deph test of many transparent faces inside chank-face?
Very minimal! There are a few ways to build on this algorithm that would have perfect draw-order. There is a way to order the draw calls per frame that will match their distance from the camera just using a line algorithm like DDA, I've not worked on this idea since this talk though.
@13:10 requires a certain amount of what?
14:02 How exactly would you do that?
OpenGL has something called an SSBO that the program can use to send data to the GPU.
Sometimes, the only reason something is technically impossible is because all the technical experts in the field are so insistant that it just _cannot_ be done. It's impossible, so don't waste your time and (more importantly to them) don't waste *their* money attempting it.
There are *so many* far better, more effective, more efficient ways to do _just about everything;_ not just technical stuff like this, but *everything.* From economics to education to even more abstract things like philosophy and self-awareness, it's possible to do better than you're currently doing; all you have to do is stop insisting it's impossible, and just _try._ And not just a token display where you design the attempt to fail, just to prove it's impossible; but a real, genuine effort to test it.
Sure, it's great to learn from the mistakes of others; you don't need to re-invent the wheel and you don't need to do it wrong yourself if you've already witnessed someone else make the same mistake. But *did* _you_ witness the mistake? Technology was re-invented at times when it was either lost, or in separate parts of the world before its prior invention from one part had spread to another. Don't re-invent the wheel, invent a *better* wheel; you can patent it if it's different and innovative enough.
Very cool, but you gotta stop with the bold and italics. It's a chore to read.
@@1234macro It's to replicate the major and minor verbal emphasis I'd put on words or phrases if I were speaking. It helps better convey the tone and avoid the flattening that happens in long posts, since I can't really pick up on tone differences intuitively.
In other words, it's how I adjust for my ASD so my writing doesn't come across with incorrect emotional projection; I've had that issue before, where I type something in an explanatory or educational tone, but without my normal speaking emphasis, they read it as a rant or an angry, accusatory post. So just let it flow naturally as speech with bold as emphasized words, and italics as half-emphasized words.
@@omargoodman2999 Tone indicators should be used sparsely, and is a matter of taste. Your usage comes across as preachy and corny. You should seek to convey your tone through improved writing as opposed to enriching the glyph itself.
Tone indicators _ought to_ be used *sparsely,* and is a matter of taste. _Your_ usage comes across as *preachy* and *corny.* You _should_ seek to convey your tone through improved _writing_ as opposed to enriching the glyph itself.
I know which version _I_ prefer.
@@1234macro Actually, the second version is a lot easier for me to read. I read it, but my brain automatically processes it like "voice". It sounds more natural and explanatory. The first one is just flat text.
@@omargoodman2999 In that case, you might have a form of dyslexia.
How do we handle collisions? This is kinda pointless if we don't have any.
Although it is usually implemented that way in 3D engines, collision detection doesn't need triangles. In this case, sampling the voxel array directly will work just as well.
@@stoician whoa, I wish to implement this in Godot, but I can't really imagine from the description. It would be cool, but how would anything know how to collide and what normal to bounce around without triangles? How do we ensure collisions match the visual mesh?
I do a "greedy meshing" algorithm that finds the largest possible rectangular prisms to represent the solid voxels. Start with the first unmeshed voxel, expand in X direction, then Y direction, then Z direction as far as you can. Then do the same for each unmeshed voxel. Take all those rectangular prisms together as collision shapes for your kinematic body, and you're done!
@@anastasiaklyuch2746 I was thinking in a full custom sort of way, but if you are using Godot, there are things you could leverage to skip the details. Keep in mind, I've never used Godot, so this is only after a quick look at the documentation, but it looks like a StaticBody3D containing BoxShape3D for each voxel would do it. Knowing that the fragment shader is going to be using the fragments location in 3D space rounded to the nearest whole number to index into the voxel array, you would iterate over the voxel array on the CPU and when a solid voxel is found use the index to generate the location coordinates to place a BoxShape3D there.
But that will be a lot of BoxShape3Ds, which might be too much for the physics engine. So a more optimized way would be to use the coordinates of the movable object/player character to generate an index and loop over the nearby voxels to place BoxShape3Ds in only those locations. Then as the object moves, also move the BoxShape3Ds that get too far away to the voxels that get closer. That should be extremely fast since it will only require a handful of Shape3Ds.
To go more towards what I was thinking in my original reply, there might be ways to improve the efficiency further by subclassing Shape3D to make a VoxelChunkShape3D, but that would require deep knowledge of how Shape3D interacts with the physics engine. Perhaps looking at how SphereShape3D and BoxShape3D are implemented will provide some hints.
hackers crack the code, game devs write the code
Suggestion: Make the video audio a bit louder. I had to watch this on 145% volume, and even then, I am lucky! Windows and MacOS users cap at 100%, and even then, most Linux users also cap at 100%. I am in a special situation, because I am using a window manager.
you are smart as fuck bro
video thumbnail has t-junctions
"really really fast, probably"
what?
overdraw 💀
Just skipped over how to implement the naive mesh, pointing to an example would have been nice nothing relevant in google?
Oh yea, there are so many tutorials for naive meshing. Literally just go over any minecraft tutorial on youtube, they never bother going past the naive meshing. Google b3agz minecraft series, follow it, and you'll know how to do naive meshing.
That multidimensional array to simple array is so incredibly overexplained holy shit, this is such basic stuff. And a 3D array in any language you'd actually consider writing something efficient in would result in continuous memory anyways, so there is literally only syntactic difference between his implementation and what the compiler would produce. This is BAD.
Cool so you chose to focus only on the small kinda irrelevant part and disregard the actual meat of the talk, good job :) It did start slow tho, I'll give you that.
It's true that the coordinates to single index conversion probably didn't need to be explained taht deepely. It's trivial stuff that any programer end up learning soon enough.
However, it's false to think that the language will actually do this for you.
While it's mostly true for any statically allocated buffers, this is no longer the case when it comes to multi dimentional heap allocated buffers where each sub-buffers (along the first coordinate) are no longer guaranteed to be contiguous in memory.
Considering the actual size of the array here being 64^3, the total memory size can vary from 0.26 MB to 2.09 MB per chunk (considering you only store simple integral type into it).
While it may work for a single chunk, it is totally unthinkable to render an entire world with statically allocated arrays of that size.
It's raining "right"s.
Right?
You used y as the up axis instead of z? Literally unwatchable
What you mean, Y has always been the up axis in 3D space and video games
@@Ethanthegrand Z is the up axis in pretty much all of mathematics, I wouldn't know about games. My comment was not serious either, I really enjoyed the video.
@@seftondepledge3658 ahh I see. Growing up coding, when working with 2D games, X is left/right, Y is up/down, which makes sense, and introducing Z adds the forward/back axis
To be exact, any axis could be up without any noticeable consequences as long as the positive directions are maintained relative to each others following the right hand rule (which basically means that the cross product of 2 axis gives the third one).
So if +X is right, and +Y is up, then +Z must be back.
I tend to prefer having Z upward because it means that X can stay pointed toward the right and Y becomes positive towards the front, which makes more sens to me, and I think this is also the reason why mathematics sticks to that norm as well.
However, just like I was saying at the begining, it barely change anything in the end as the direction ultimately ends up to be relative to the camera orientation.
@@arthur1112132 Oh, it makes absolutely no difference to the maths I agree. But my brain prefers it that way. I think it also makes sense (in maths at least) as you are used to 2d graphs being on paper on a desk etc where the x-y plane is horizontal. So z being the vertical axis is just an extension of this. Not a reordering.
"55 FPS is slow"
Really? I don't even find 30 to be slow. If I saw a video that's 30 FPS and a video that's 60 FPS, I doubt I'd even be able to tell them apart.
Yep, there's a big difference between 30, 60 and 144. It's less noticeable on video because you're not inputting anything. higher fps noticeably improves the responsiveness and smoothness of games.
55 FPS is definitely slow. in the context of real-time rendering for games, that is. and yes, the difference between 30 and 60 is definitely noticeable. most people can tell them apart. what is the refresh rate of your monitor?
It is slow in the context of real time rendering.
First of all, we tend to easily forget that FPS are actually a verry bad benchmarking unit as it is non linear.
Loosing 5 FPS from 60 to 55 is not the same as loosing 5 FPS from 200 to 195: the former means a 1.51 milliseconds loss while the later only represents a 0.12 milliseconds difference.
Now, even if it is highly dependent on the exact hardware you have, most of them are able to reach over 60K FPS when rendering less than a few tousands of triangles, which means a minimum of roughly 0.016 milliseconds or 16 microseconds per frame.
When working with realtime rendering, we often consider our program as having a "frame time budget" that we are gonna spend across all of our different features.
55 FPS means 18.18 milliseconds per frame, which is probably finewhen talking about a completely finished game (even tho some would argue with that).
However, when talking about a prototype like in the video where the only thing done is meshing and rendering a single 64^3 chunk, 18 milliseconds IS slow because it basically means that all of your time budget has been spent on that chunk only and that you won't be able to spend much more for anything else without negatively impacting the overall performances of your project.
Right?