The Morrowind conversion is on hold indefinitely at the moment - I think I underestimated the complexity of the project, and the difficulty of teasing out the rendering logic from OpenMW and the Ogre rendering engine. If someone was interested in helping me with this (I basically need to turn a Morrowind sector into a list of textured polygons that I can pipe into my triangle-to-voxel renderer), I might pick it up again.
Don't know if you ever followed up on this, but if you used some compression algorithms with your height/displacement map, then you could significantly reduce the file size of the voxel model, since it renders directly from the height map. Tessellation would also be a good way to handle progressive level of detail for polygons.
@@LegendLength well voxels aren't anything new, however we've seen them get more and more popular with projects like teardown or John Lin's sandbox Both of them use octrees to render huge amount of data.
I don't have heaps of experience with fragment shaders but suspect think they may not be flexible enough. The octree is several 100 megs in memory and so can't fit into a texture, and each thread needs an array for tracking visited nodes (I've tried kd-restart but couldn't get it fast enough). It could probably be ported to OpenCL though.
theb1rd it's a local array inside the kernel function - I think that means it's effectively global memory. I hadn't heard of the shared memory approach (at least not for thread-local storage). Sounds definitely worth trying. Thanks for the tip :)
Another method of mixing Voxels and Polygons is to store the model as an octree, Raytrace the distance, and use Manifold Dual Contouring to convert the octree to polygons.
Also, are you doing any reprojection of pixels that have already been ray-cast last frame so you only have to fill the holes or just ray-casting all those pixels every frame?
DjChronokun I'm ray-casting everything each frame (excluding the pixels covered by polygons). Re-projection sounds interesting though. Is there any prior research I can read up on? A few things aren't obvious, like content coming in from the side of the frame and obscuring re-projected pixels from the previous frame. The main issue with a glsl shader is that it has to be able to randomly address several hundred megabytes of voxel data. Correct me if I'm wrong (I'm not too familiar with glsl) but I believe a shader would have to access the voxel data via texture objects, which would limit the amount of addressable memory and therefore the number of voxels in the scene.
Tom Mulgrew well I did come across this: voxels.blogspot.co.nz/2014/05/raycaster-speed-up-up-to-400-by-image.html but it seems to come with some artifacts, I hadn't considered the case of things moving in-front of existing pixels, that would seem to cause problems. A glsl shader would either have to access textures or shader storage buffers, I'm not sure if that would impose extra memory limits or not, each SSBO is guaranteed to be at least 16MB but I don't know how much total memory you can have, I'd guess it would be limited to the available graphics memory which I think is usually about 2x the amount of ram on the card so it might require some extra manual paging of textures/SSBOs
do you know any papers that talk about 6DOF raycasting like this? Amazing work btw, i've thought of doing a raycast engine like this but the jump from 4 to 6DOF is quite daunting
One question: Why do you render voxels as boxes? Wouldn't it be easier and more efficient to just use solid filled 2D shapes like circles or squares (either with hard or soft edges) and giving them a little bit of overlap to prevent unfilled pixels from showing up?
It's essentially an aspect of the data structure and the fact that it's ray traced. In a SVO the scene is stored as a cube divided into 8 equal sized smaller cubes, which are in turn also divided into smaller cubes and so on.. until you eventually get down to the tiny cubes that you render. If you wanted different shapes (spheres etc), you'd need a different code path to handle the rendered nodes while still efficiently searching the higher level nodes. I'd expect that sort of branching to incur a significant performance penalty in CUDA. It may be possible.. (There may be a clever trick to it that I don't know about). But I can't see how to do it efficiently.
Tom Mulgrew I didn't mean that the space should be subdivided into sphere-shaped regions, but what the primitive shapes that each voxel get ultimately rendered by should look like.
joeseth05 it's the ray-tracing that makes it complicated. Rather than 1) Find voxels, 2) Render them, it's 1) Trace a ray through each pixel, 2) figure out what it hit, 3) draw the pixel in that colour. So it's not deliberately drawing the voxels cube shaped, but rather the pixels whose rays hit the same voxel happen to form a cube-shaped projection, due to the nature of the ray-traced data structure.
Tom Mulgrew Ah, OK, so it sounds like it employs ray-casting(a special kind of ray-tracing) for the rendering of the voxels as opposed to rasterization which I initially assumed, then of course, the cube shape makes sense. On the other hand, I wonder whether rasterizing voxel grids is faster then ray-casting them, because usually rasterizers are faster then ray-casters for all kind of primitives, but rasterizers are more complicated. But these are voxel grids, and rendering voxel grids as primitives should be easier than rendering triangles, becuase voxel grids are even more primitive than triangles. Only problem I can attest that voxel grid rasterizers do pose in practice is to get the neighbor overlap right.
joeseth05 sparse voxel octrees may be the point where ray-casting overtakes traditional rendering, at least when you get down to 1 pixel per voxel. Chiefly because rasterizing time is roughly proportional to the number of primatives, (there's about half a billion voxels loaded at any one time in my scene) whereas with ray-casting it's proportional to the number of pixels. I can get 60-70fps at 720p. Jon Olick tweeted a few months back that he was managing 60fps @ 1080p on similar hardware (and seemed confident he could double it again).
Do you use Unlimited Detail-like way of voxel traversal optimization (using CUDA) or each ray cast separately (pack of rays per thread)? Also do you use UD-like voxel data size reduction optimization (linkless octree)?
Have UD published some details of their algorithm? I thought they kept it heavily under wraps... ? The rays are each cast separately. The initial stack state (traversing from the root to the voxel containing the camera) is computed once into shared memory then copied for each ray, but that's about all the rays share. The data is stored in fixed-depth sub-trees, which are paged in from disk and composed together to create the full tree. I don't store any topology data at all for the leaf nodes (just colour data), leaf parents store just an 8 bit mask of which children are present, and leaf nodes above that store the mask plus a 3 byte index of the first child node within the sub-tree (so 1+3=4 bytes). The colour data is stored as per research.nvidia.com/publication/efficient-sparse-voxel-octrees. The bytes/voxel depends heavily on the average occupancy. The scene averages about 3.5 child voxels per parent, which leads to 2.25 bytes/voxel.
Tom Mulgrew Algorithm in short: for whole viewport traverse octree nodes front-to-back (8 possible directions for 8 children nodes, order could be easily calculated by XOR operation with mask from three signs of (dx,dy,dz) from camera origin to parent node center), 1) if child behind camera - skip it, 2) if child partially behind camera - subdivide (and repeat same front-to-back algorithm for its children), 3) if child in front of camera, but too big (not single pixel) - first test for occlusion (any shape that 100% covering all children vs stencil/z-buffer, filled rect would suffice), then if not occluded - subdivide (add children indexes to stack in back-to-front order to process them front-to-back) , 4) if child in front of camera and small enough (single pixel) - draw it (and fill stencil or z-buffer to help to skip tons of occluded voxels), if cannot subdivide - draw filled rect (same as single pixel, but bigger). In fact we can save all pixel-on-screen-sized voxels in separate memory and render them on each frame, updating them dynamically by queries to load more voxels (when subdividing) and un-subdividing children when all of them occluded. UD optimization: switch from perspective projection of nodes points to ortho-projection when estimated node size in pixels on screen becomes too small and perspective projection gives nearly same result as ortho-projection. This trick saves lots of division-by-z operations and could be performed in fixed-point arithmetics. It's also patented: www.google.com/patents/WO2014043735A1 Linkless octree format: 8 bit mask of children in parent node, next level of tree contains only valid children, child index calculated as sum of set bits from previous level plus child index inside parent. Same for next levels. Saves a lot of memory if only surface of voxels encoded (not volume). Requires lookup tables, but could be performed on CPU.
Braid Acer Thanks. It'll take me a while to digest that. It does seem to me you'd need pretty efficient occlusion tests to make something like that work fast enough though (and not just testing 1 pixel at a time)..? The linkless octree format sounds like what I was considering for storing sub-trees on disk (sequential list of 8 bit child masks for each node in breath-first order). But I'd have to put the links back in when loading into device memory as I can't see how you could efficiently traverse it otherwise. I was also considering a compression scheme that involves taking each possible bitmask and building a huffman tree for predicting what a child node's bitmask would be for a parent node with that bitmask. My theory is that specific surface directions would make certain bitmasks more likely and that some redundancy could be squeezed out by exploiting that. All just theory at the moment.
Tom Mulgrew You should check whole shape of pixels at once. It may seem not very efficient, but it is, in the end, as it allows to optimize occlusion in the best way possible, no invisible areas will be octree-depth traversed at all. Also no checks required for already subdivided nodes, only checks for further subdivisions and "mergebacks". The question is -- could it be implemented in CUDA? My best variant was to split viewport into N smaller views and use CUDA as threads-provider, if it is possible this way. You'll need read-only shared memory for active nodes (unpacked with indexes of children and color/material information) and small per-thread writeable buffer to collect subdivisions and "mergebacks" queries to be processed asynchronously on CPU, including possibility to stream data the same way it works in UD. Can CUDA work this way?
Braid Acer I'm not sure sorry. I don't have a clear enough picture of what you're trying to do. It sounds like for the general algorithm (sub-dividing, merging and branching) the challenge would be to minimize the amount of branching. CUDA (and GPUs in general) run batches of threads in lock-step, where each thread executes the same instruction but with different data. So if code branches the threads that don't follow the branch stall and wait for the ones that do to complete. Async CPU/GPU is quite doable with CUDA, and the synchronisation is pretty straightforward. You can also overlap data transfers with GPU processing.
I think it's because the octree has the equivalent of mipmapping built in, but there's no linear filtering. You can't account for "this pixel is 50% between texel A and texel B" like linear texture filtering does. You could theoretically implement something based on where the ray hits the voxel, but you would need to find the neighbouring voxels to interpolate their colours, which would be slow and expensive. I can't think of any way to avoid this except casting multiple rays per pixel - which would slash the frame rate dramatically.
looks cool, wouldn't it make more sense to render more complex (no heightmap) terrain? did you compare the performance of this to a heightmap rendered with the tessellation shader and displacement mapping?
necrowizzard Yes, it would. In fact it should be able to cope with all sorts of general purpose scenes theoretically. I just don't have the time or artistic ability to create a large complex scene myself, whereas large heightmaps are easy to generate with L3DT. I'm working on adding some trees (and boulders, bridges, houses, etc) though. And no, haven't compared it with any traditional polygon approach. I'd expect a tessellation shader to be faster.
Tom Mulgrew Sounds like a good approach, good luck with adding more geometry. I only know of an article producing nice results that works the other way around (http.developer.nvidia.com/GPUGems3/gpugems3_ch01.html). They start from volume data and generate a mesh via marching cubes. But, this is probably a bad reference due to the overhead in implementation. I don't know of any tool using their approach (probably there is one).
+Lando Calrissian Nope :). The cubes are stored in a tree structure and ray-cast using CUDA. It computes a ray for each pixel and calculates which voxel it will hit in the tree.
I'm five years late, but... Hear, hear! This is absolutely brilliant, and should definitely gain momentum in the larger gaming sphere. Plus, the more developers there are who start working on implementations, the more refinement and performance-saving solutions there will be.
@@erich_l4644 Personally, I believe so. Haven't seen it used anywhere else by anyone else. Yes, voxels are used in some instance for LoD (see this Stack Exchange response: gamedev.stackexchange.com/a/70841 ), but this approach takes it to the next level altogether. I would much prefer a vendor-independent solution that didn't require Nvidia (CUDA) to compile the SVOs (e.g., OpenCL), but it makes sense for Tom to use the best option available to him.
Man, you did what I was searching for years. It's like a improved Voxel Space engine used by Novalogic in Delta Force 1 and 2. I'm not a programmer but I'm learning some things in Unit3D, trying to remake at least the multiplayer of Delta Force that is flawless. Did you ever played Delta Force?
Tom Mulgrew Yeah, it was! Look to the terrain in your video. Now imagine a 8vs8, basically a war with this flawless environment... trenches, lakes, hills, holes... just few buildings. There's some DF1 and DF2 multiplayer videos in my channel. Here's the question: How to use that kind of terrain as you did? I have some splat/normal maps and things to generate some of the old DF terrains, check this: imgur.com/a/iF6Se Is there a pack/files available out there or it's a private thing? I've generated one of those terrains using the grayscale image (splat) with the default Unit3D terrain system. (take 2 minutes) But the result is not smooth as the Voxel Space engine created by Novalogic, a lot of vertices/poligons.
Eduardo Macedo Unity3D does have some Voxel engines for download. I know there's one for about 80 bucks. There's also Efficient-Sparce-voxel-octrees which is an open source project. What Mr. Mulgrew has done here is truly impressive. DF2 was an expansive world full of great players during it's time, and the type of game that is sorely missed in today's closed down, tunnel vision arcade shooters. A DF2 like game, with new maps, TXP maps and better graphics, more models etc, would be a dream.
increality For sure, would be a dream! After DF1, DF2 and even DFLW and DFTFD I never saw a game with those characteristics. I've played BF2, BF3, BF4, CODs, Insurgency and a lot of FPSs and what Novalogic did, no one else did. The question is: The engine in his video seems perfect for the game concept, outstanding. I can't find anything in Unit3D store with these characteristics, I just saw these cubic voxel engines, but on this subject I'm a newbie. I'm wrong? There's similar things over there? The bad part is: I'm newbie and I can't waste $80 to test something... PS: You are in the DF2 Reunion group on Facebook? If yes, what's your name?
Eduardo Macedo I was just accepted to the group having applied for it last night. I go by Jesse, I'll add you over there now. I also did some BF2, and BFBC2. BF2 was still better than most, at least the maps were relatively larger than say, COD or the like. Joint Ops was ok too, but too color saturated for my tastes, and lacking in the natural feel DF2 had. You could also look into 3DEM's for map creation. There's a lot of free satellite imagery, which can be converted into full, repeatable and scale-able terrains. I understand the issues with not wanting to buy/test voxel software. There are similar things, full voxel, but not the hybrid polygon/voxel that he's doing. Not to my knowledge.
I have not much experience with a rendering engine at all. However, I know what a raycaster is (though I haven't yet coded one) and what voxel is, and I know what a polygon is and have heard about OpenGL it is used for 3D rendering but it is too hard for me. You did a great explanation, and the visuals helped alot.
You rule sir! Check my channel header, it's a Delta Force 1 voxel terrain. Tell me, you could reproduce the terrain with a pack of normal map, texture and the black and white image that I forgot the tech name? (I could show you the three images that the Delta Force 1 engine uses to generate the map)
Randomly stumbled upon this now and I must say this is a really clever way of dealing with voxels :-)
The Morrowind conversion is on hold indefinitely at the moment - I think I underestimated the complexity of the project, and the difficulty of teasing out the rendering logic from OpenMW and the Ogre rendering engine.
If someone was interested in helping me with this (I basically need to turn a Morrowind sector into a list of textured polygons that I can pipe into my triangle-to-voxel renderer), I might pick it up again.
This is pretty brilliant!
Thanks. I've put quite a lot of work into it :)
Tom Mulgrew He's a DF player too hahahaha I said about your video to him HAHHAAHA, it's a magic, no doubt!
Tom Mulgrew - when is the next video? :-)
Tomkat53 when I next have something new to show I guess :-). I haven't had a lot of time to work on this lately, so could be a while.
Thanks for this, I just recently started experimenting with marching cubes and surface nets - this looks very interesting and is very inspiring!
I really like your hydraulic erosion technique, looks VERY natural. maybe the terrain is scaled a bit too tall but other than that it loos fabulous!
Thanks. I can't claim credit for the erosion though. That was done by the L3DT terrain generator www.bundysoft.com/L3DT/
Love it. i'm totally stealing this idea. Yoink!
Don't know if you ever followed up on this, but if you used some compression algorithms with your height/displacement map, then you could significantly reduce the file size of the voxel model, since it renders directly from the height map.
Tessellation would also be a good way to handle progressive level of detail for polygons.
cool stuff.
Crazy to think that we used to have voxels for far distances and polygons for up close, now it's the opposite haha
@@LegendLength yeah partially, some games use the voxel for their aesthetic, others for physics, like teardown
@@LegendLength well voxels aren't anything new, however we've seen them get more and more popular with projects like teardown or John Lin's sandbox
Both of them use octrees to render huge amount of data.
Very very impressive mate
Any particular reason for ray-casting in CUDA instead of a fragment shader?
I don't have heaps of experience with fragment shaders but suspect think they may not be flexible enough. The octree is several 100 megs in memory and so can't fit into a texture, and each thread needs an array for tracking visited nodes (I've tried kd-restart but couldn't get it fast enough).
It could probably be ported to OpenCL though.
Thanks for the info. One more question, if you don't mind: are the traversal stack arrays in shared memory, or an ordinary buffer?
theb1rd it's a local array inside the kernel function - I think that means it's effectively global memory.
I hadn't heard of the shared memory approach (at least not for thread-local storage). Sounds definitely worth trying. Thanks for the tip :)
very cool!
Another method of mixing Voxels and Polygons is to store the model as an octree,
Raytrace the distance, and use Manifold Dual Contouring to convert the octree to polygons.
brilliant :)
is there any advantage to doing the ray-cast in CUDA vs a glsl compute shader?
Also, are you doing any reprojection of pixels that have already been ray-cast last frame so you only have to fill the holes or just ray-casting all those pixels every frame?
DjChronokun I'm ray-casting everything each frame (excluding the pixels covered by polygons).
Re-projection sounds interesting though. Is there any prior research I can read up on? A few things aren't obvious, like content coming in from the side of the frame and obscuring re-projected pixels from the previous frame.
The main issue with a glsl shader is that it has to be able to randomly address several hundred megabytes of voxel data. Correct me if I'm wrong (I'm not too familiar with glsl) but I believe a shader would have to access the voxel data via texture objects, which would limit the amount of addressable memory and therefore the number of voxels in the scene.
Tom Mulgrew well I did come across this: voxels.blogspot.co.nz/2014/05/raycaster-speed-up-up-to-400-by-image.html but it seems to come with some artifacts, I hadn't considered the case of things moving in-front of existing pixels, that would seem to cause problems.
A glsl shader would either have to access textures or shader storage buffers, I'm not sure if that would impose extra memory limits or not, each SSBO is guaranteed to be at least 16MB but I don't know how much total memory you can have, I'd guess it would be limited to the available graphics memory which I think is usually about 2x the amount of ram on the card so it might require some extra manual paging of textures/SSBOs
Very cool!
do you know any papers that talk about 6DOF raycasting like this? Amazing work btw, i've thought of doing a raycast engine like this but the jump from 4 to 6DOF is quite daunting
Have you ever used this in a project? Have you ever considered releasing the code?
One question: Why do you render voxels as boxes? Wouldn't it be easier and more efficient to just use solid filled 2D shapes like circles or squares (either with hard or soft edges) and giving them a little bit of overlap to prevent unfilled pixels from showing up?
It's essentially an aspect of the data structure and the fact that it's ray traced.
In a SVO the scene is stored as a cube divided into 8 equal sized smaller cubes, which are in turn also divided into smaller cubes and so on.. until you eventually get down to the tiny cubes that you render.
If you wanted different shapes (spheres etc), you'd need a different code path to handle the rendered nodes while still efficiently searching the higher level nodes. I'd expect that sort of branching to incur a significant performance penalty in CUDA.
It may be possible.. (There may be a clever trick to it that I don't know about). But I can't see how to do it efficiently.
Tom Mulgrew I didn't mean that the space should be subdivided into sphere-shaped regions, but what the primitive shapes that each voxel get ultimately rendered by should look like.
joeseth05 it's the ray-tracing that makes it complicated. Rather than 1) Find voxels, 2) Render them, it's 1) Trace a ray through each pixel, 2) figure out what it hit, 3) draw the pixel in that colour.
So it's not deliberately drawing the voxels cube shaped, but rather the pixels whose rays hit the same voxel happen to form a cube-shaped projection, due to the nature of the ray-traced data structure.
Tom Mulgrew Ah, OK, so it sounds like it employs ray-casting(a special kind of ray-tracing) for the rendering of the voxels as opposed to rasterization which I initially assumed, then of course, the cube shape makes sense.
On the other hand, I wonder whether rasterizing voxel grids is faster then ray-casting them, because usually rasterizers are faster then ray-casters for all kind of primitives, but rasterizers are more complicated. But these are voxel grids, and rendering voxel grids as primitives should be easier than rendering triangles, becuase voxel grids are even more primitive than triangles.
Only problem I can attest that voxel grid rasterizers do pose in practice is to get the neighbor overlap right.
joeseth05 sparse voxel octrees may be the point where ray-casting overtakes traditional rendering, at least when you get down to 1 pixel per voxel. Chiefly because rasterizing time is roughly proportional to the number of primatives, (there's about half a billion voxels loaded at any one time in my scene) whereas with ray-casting it's proportional to the number of pixels.
I can get 60-70fps at 720p. Jon Olick tweeted a few months back that he was managing 60fps @ 1080p on similar hardware (and seemed confident he could double it again).
very nice!
Do you use Unlimited Detail-like way of voxel traversal optimization (using CUDA) or each ray cast separately (pack of rays per thread)? Also do you use UD-like voxel data size reduction optimization (linkless octree)?
Have UD published some details of their algorithm? I thought they kept it heavily under wraps... ?
The rays are each cast separately. The initial stack state (traversing from the root to the voxel containing the camera) is computed once into shared memory then copied for each ray, but that's about all the rays share.
The data is stored in fixed-depth sub-trees, which are paged in from disk and composed together to create the full tree. I don't store any topology data at all for the leaf nodes (just colour data), leaf parents store just an 8 bit mask of which children are present, and leaf nodes above that store the mask plus a 3 byte index of the first child node within the sub-tree (so 1+3=4 bytes). The colour data is stored as per research.nvidia.com/publication/efficient-sparse-voxel-octrees.
The bytes/voxel depends heavily on the average occupancy. The scene averages about 3.5 child voxels per parent, which leads to 2.25 bytes/voxel.
Tom Mulgrew Algorithm in short: for whole viewport traverse octree nodes front-to-back (8 possible directions for 8 children nodes, order could be easily calculated by XOR operation with mask from three signs of (dx,dy,dz) from camera origin to parent node center), 1) if child behind camera - skip it, 2) if child partially behind camera - subdivide (and repeat same front-to-back algorithm for its children), 3) if child in front of camera, but too big (not single pixel) - first test for occlusion (any shape that 100% covering all children vs stencil/z-buffer, filled rect would suffice), then if not occluded - subdivide (add children indexes to stack in back-to-front order to process them front-to-back) , 4) if child in front of camera and small enough (single pixel) - draw it (and fill stencil or z-buffer to help to skip tons of occluded voxels), if cannot subdivide - draw filled rect (same as single pixel, but bigger). In fact we can save all pixel-on-screen-sized voxels in separate memory and render them on each frame, updating them dynamically by queries to load more voxels (when subdividing) and un-subdividing children when all of them occluded.
UD optimization: switch from perspective projection of nodes points to ortho-projection when estimated node size in pixels on screen becomes too small and perspective projection gives nearly same result as ortho-projection. This trick saves lots of division-by-z operations and could be performed in fixed-point arithmetics. It's also patented: www.google.com/patents/WO2014043735A1
Linkless octree format: 8 bit mask of children in parent node, next level of tree contains only valid children, child index calculated as sum of set bits from previous level plus child index inside parent. Same for next levels. Saves a lot of memory if only surface of voxels encoded (not volume). Requires lookup tables, but could be performed on CPU.
Braid Acer Thanks. It'll take me a while to digest that. It does seem to me you'd need pretty efficient occlusion tests to make something like that work fast enough though (and not just testing 1 pixel at a time)..?
The linkless octree format sounds like what I was considering for storing sub-trees on disk (sequential list of 8 bit child masks for each node in breath-first order). But I'd have to put the links back in when loading into device memory as I can't see how you could efficiently traverse it otherwise.
I was also considering a compression scheme that involves taking each possible bitmask and building a huffman tree for predicting what a child node's bitmask would be for a parent node with that bitmask. My theory is that specific surface directions would make certain bitmasks more likely and that some redundancy could be squeezed out by exploiting that. All just theory at the moment.
Tom Mulgrew You should check whole shape of pixels at once. It may seem not very efficient, but it is, in the end, as it allows to optimize occlusion in the best way possible, no invisible areas will be octree-depth traversed at all. Also no checks required for already subdivided nodes, only checks for further subdivisions and "mergebacks". The question is -- could it be implemented in CUDA? My best variant was to split viewport into N smaller views and use CUDA as threads-provider, if it is possible this way. You'll need read-only shared memory for active nodes (unpacked with indexes of children and color/material information) and small per-thread writeable buffer to collect subdivisions and "mergebacks" queries to be processed asynchronously on CPU, including possibility to stream data the same way it works in UD. Can CUDA work this way?
Braid Acer I'm not sure sorry. I don't have a clear enough picture of what you're trying to do.
It sounds like for the general algorithm (sub-dividing, merging and branching) the challenge would be to minimize the amount of branching. CUDA (and GPUs in general) run batches of threads in lock-step, where each thread executes the same instruction but with different data. So if code branches the threads that don't follow the branch stall and wait for the ones that do to complete.
Async CPU/GPU is quite doable with CUDA, and the synchronisation is pretty straightforward. You can also overlap data transfers with GPU processing.
Great work! Could you explain why these flimmering aliasing artifacts appear (in voxel-ray-traced parts) and how they could be avoided?
I think it's because the octree has the equivalent of mipmapping built in, but there's no linear filtering. You can't account for "this pixel is 50% between texel A and texel B" like linear texture filtering does.
You could theoretically implement something based on where the ray hits the voxel, but you would need to find the neighbouring voxels to interpolate their colours, which would be slow and expensive.
I can't think of any way to avoid this except casting multiple rays per pixel - which would slash the frame rate dramatically.
That's a great answer, thanks :)
Great
looks cool, wouldn't it make more sense to render more complex (no heightmap) terrain? did you compare the performance of this to a heightmap rendered with the tessellation shader and displacement mapping?
necrowizzard Yes, it would. In fact it should be able to cope with all sorts of general purpose scenes theoretically. I just don't have the time or artistic ability to create a large complex scene myself, whereas large heightmaps are easy to generate with L3DT. I'm working on adding some trees (and boulders, bridges, houses, etc) though.
And no, haven't compared it with any traditional polygon approach. I'd expect a tessellation shader to be faster.
Tom Mulgrew Sounds like a good approach, good luck with adding more geometry.
I only know of an article producing nice results that works the other way around (http.developer.nvidia.com/GPUGems3/gpugems3_ch01.html). They start from volume data and generate a mesh via marching cubes. But, this is probably a bad reference due to the overhead in implementation. I don't know of any tool using their approach (probably there is one).
amazing
Awesome stuff! Are you still working on the Morrowind voxel conversion?
Just curious how are you rendering the voxels, are they cubes created with opengl?
+Lando Calrissian Nope :). The cubes are stored in a tree structure and ray-cast using CUDA. It computes a ray for each pixel and calculates which voxel it will hit in the tree.
Are you working for High Fidelity now? I think they are using the same technique
Nope. Never head of them. This is just a personal project I built a couple of years ago.
What is your technical background?
clankill3r (Sorry about the delay.. been busy). My day job is a web developer. The 3D graphics and voxels are a spare-time hobby.
Tom Mulgrew Nice, I want to make an engine as well as a hobby project. You being able to do this is really inspiring.
Minecraft should use this,
you can see so far it's amazing!
Submit a paper to SIGGRAPH !
I'm five years late, but... Hear, hear!
This is absolutely brilliant, and should definitely gain momentum in the larger gaming sphere. Plus, the more developers there are who start working on implementations, the more refinement and performance-saving solutions there will be.
@@HickoryDickory86 is this a cutting edge idea? I need to do a graduation paper... hahahaha
@@erich_l4644 Personally, I believe so. Haven't seen it used anywhere else by anyone else. Yes, voxels are used in some instance for LoD (see this Stack Exchange response: gamedev.stackexchange.com/a/70841 ), but this approach takes it to the next level altogether.
I would much prefer a vendor-independent solution that didn't require Nvidia (CUDA) to compile the SVOs (e.g., OpenCL), but it makes sense for Tom to use the best option available to him.
Sources from the Voxlap and in what language (C ++ or Java)?
+Dark Rider sorry, I haven't released the source.
The main application is C++. The voxel rendering is CUDA.
Tom Mulgrew There is a Sparse Voxel Octree?
Yes. All the voxels are stored in a sparse voxel octree and rendered from there.
do you plan to publish your code?
No concrete plans at this stage, sorry.
I still have some more development before I start thinking about what to do with this.
Man, you did what I was searching for years. It's like a improved Voxel Space engine used by Novalogic in Delta Force 1 and 2. I'm not a programmer but I'm learning some things in Unit3D, trying to remake at least the multiplayer of Delta Force that is flawless.
Did you ever played Delta Force?
No, never played it. I checked out a couple of RUclips videos though. Looks impressive for its time.
Tom Mulgrew Yeah, it was! Look to the terrain in your video. Now imagine a 8vs8, basically a war with this flawless environment... trenches, lakes, hills, holes... just few buildings. There's some DF1 and DF2 multiplayer videos in my channel.
Here's the question: How to use that kind of terrain as you did? I have some splat/normal maps and things to generate some of the old DF terrains, check this: imgur.com/a/iF6Se
Is there a pack/files available out there or it's a private thing?
I've generated one of those terrains using the grayscale image (splat) with the default Unit3D terrain system. (take 2 minutes) But the result is not smooth as the Voxel Space engine created by Novalogic, a lot of vertices/poligons.
Eduardo Macedo Unity3D does have some Voxel engines for download. I know there's one for about 80 bucks. There's also Efficient-Sparce-voxel-octrees which is an open source project. What Mr. Mulgrew has done here is truly impressive. DF2 was an expansive world full of great players during it's time, and the type of game that is sorely missed in today's closed down, tunnel vision arcade shooters. A DF2 like game, with new maps, TXP maps and better graphics, more models etc, would be a dream.
increality For sure, would be a dream! After DF1, DF2 and even DFLW and DFTFD I never saw a game with those characteristics. I've played BF2, BF3, BF4, CODs, Insurgency and a lot of FPSs and what Novalogic did, no one else did.
The question is: The engine in his video seems perfect for the game concept, outstanding. I can't find anything in Unit3D store with these characteristics, I just saw these cubic voxel engines, but on this subject I'm a newbie. I'm wrong? There's similar things over there? The bad part is: I'm newbie and I can't waste $80 to test something...
PS: You are in the DF2 Reunion group on Facebook? If yes, what's your name?
Eduardo Macedo I was just accepted to the group having applied for it last night. I go by Jesse, I'll add you over there now. I also did some BF2, and BFBC2. BF2 was still better than most, at least the maps were relatively larger than say, COD or the like. Joint Ops was ok too, but too color saturated for my tastes, and lacking in the natural feel DF2 had. You could also look into 3DEM's for map creation. There's a lot of free satellite imagery, which can be converted into full, repeatable and scale-able terrains. I understand the issues with not wanting to buy/test voxel software. There are similar things, full voxel, but not the hybrid polygon/voxel that he's doing. Not to my knowledge.
I have not much experience with a rendering engine at all. However, I know what a raycaster is (though I haven't yet coded one) and what voxel is, and I know what a polygon is and have heard about OpenGL it is used for 3D rendering but it is too hard for me. You did a great explanation, and the visuals helped alot.
You rule sir! Check my channel header, it's a Delta Force 1 voxel terrain. Tell me, you could reproduce the terrain with a pack of normal map, texture and the black and white image that I forgot the tech name? (I could show you the three images that the Delta Force 1 engine uses to generate the map)
Note: These are REAL voxels, not minecraft 'voxels'