TL;DR - it's possible to use the Z-buffer to incrementally build a signed distance field representation of a 3D scene as the camera moves around, which you can then trace rays through for approximate collision detection, reflections, ambient occlusion, GI etc.
The idea for this came from Bart Wronski's nice article "The Future of Screenspace Reflections". At the end, he mentions the idea of somehow caching geometric information between frames to give additional information to improve the screen space ray marching.
The obvious surface representation to use seemed to be signed distance fields (SDFs), which have many nice properties, as discussed at length elsewhere. They can be traversed quickly using so-called "Sphere Tracing", which is basically distance-enhanced ray marching.
This reminded me of KinectFusion, which is a clever system which fuses together noisy depth images from the Kinect sensor to incrementally build a smooth 3D model of a scene as you move the camera around. It uses a so-called "truncated" signed distance field (stored in a 3D texture) to represent the scene. This technique was actually first described in a much older paper from 1996 about off-line alignment of depth images from 3D scanners: "A Volumetric Method for Building Complex Models from Range Images".
In real-time graphics, of course, we already have a high-quality depth image, and we know the exact camera position, so something like this ought to be a lot easier for us, right?
It's pretty simple to implement - you create a 3D SDF texture, initialise it to the maximum distance, and then for each frame execute a compute shader, launching one thread for each voxel in the SDF, and for each voxel:
Ideas for improvements
The idea for this came from Bart Wronski's nice article "The Future of Screenspace Reflections". At the end, he mentions the idea of somehow caching geometric information between frames to give additional information to improve the screen space ray marching.
The obvious surface representation to use seemed to be signed distance fields (SDFs), which have many nice properties, as discussed at length elsewhere. They can be traversed quickly using so-called "Sphere Tracing", which is basically distance-enhanced ray marching.
This reminded me of KinectFusion, which is a clever system which fuses together noisy depth images from the Kinect sensor to incrementally build a smooth 3D model of a scene as you move the camera around. It uses a so-called "truncated" signed distance field (stored in a 3D texture) to represent the scene. This technique was actually first described in a much older paper from 1996 about off-line alignment of depth images from 3D scanners: "A Volumetric Method for Building Complex Models from Range Images".
In real-time graphics, of course, we already have a high-quality depth image, and we know the exact camera position, so something like this ought to be a lot easier for us, right?
It's pretty simple to implement - you create a 3D SDF texture, initialise it to the maximum distance, and then for each frame execute a compute shader, launching one thread for each voxel in the SDF, and for each voxel:
- calculate the world space position of the voxel
- project this position into the camera space
- read the depth from the Z buffer at this position
- calculate the distance from the voxel to the depth buffer sample
- convert this to a "truncated" distance, or just do some math to convert the surface into a thin 3D slab (this is what I do currently)
- potentially do some kind of clever averaging with the existing distance value
- profit.
It turns out this works pretty well.
After a single frame, it will give similar results to any other screen space technique, but as you walk around the scene, the details and occluded parts of the SDF get filled in, and they are still maintained outside the view (at least to a certain extent).
The video shows a simple prototype done in Unity DX11. The grey geometry is the original scene, the coloured image is the ray-marched SDF (visualized as the normal calculated from the SDF gradient). At the beginning, it shows a few individual depth frames being added to the SDF. Then it switches to continuous updates as the player moves through the arch. Then continuous updates are switched off so you can see the SDF quality as the player walks back through the arch.
You can move the 3D texture to follow the player as they move around the scene, scrolling the contents of the 3D texture. You probably don't want to do this every frame, because the repeated filtering of the volume will cause the details to blur out, but you can do it in blocks, every time the player has moved a certain distance.
For reflections, this technique would only show reflections of nearby surfaces that you have looked at previously, which is kind of wacky. You could initialize the SDF around the player by rendering in the 6 cube directions to avoid this problem.
People have used the Z-buffer for approximate particle collisions (see, for example, Halo Reach Effects Tech). The problem with this is the particles only collide against visible surfaces, and if you look away and then turn back, the particles will have fallen through the floor. With this technique, the particles will still be there. Maybe.
Problems
- large 3D textures use quite a lot of memory, although we only need a single channel fp16 texture here, so it's not too bad. 128*128*128*2 = 4MB.
- the limited resolution of the SDF means the scene isn't represented exactly. There are artifacts (e.g. see edge of tunnel arch in the video).
- the mismatch between the resolution of the volume and the depth buffer means the depth buffer is usually under-sampled. Could generate mipmaps to help with this?
- Surfaces that are facing away from the camera (wihch cover a large depth range) cause problems / holes - weight these lower?
- Use sparse (tiled) 3D textures (DX12) to avoid storing empty regions
- Capture more than one depth value per pixel - second depth layer (possibly with a minimum separation like "Fast Global Illumination Approximations on Deep G-Buffers").
- Use the SDF for improving/accelerating screen space ray marching
- Store shaded color in separate 3D texture and use for blurry reflections.
Anyway, I hadn't seen this idea described anywhere before, so I thought it was worth recording here before I forgot it! I'd be interested in hearing what other people think.