GPU-driven rendering (SIGGRAPH 2015 follow up)

People have been asking me virtual texturing related questions. We have decided that we will be talking more about our new virtual texturing implementation later, when we can also show some game footage. Our virtual texturing system is complex enough to warrant a series of presentations.

My older Digital Foundry interviews give some tidbits about Trials Fusion and Trials Evolution virtual texturing systems:
http://www.eurogamer.net/articles/digitalfoundry-trials-evolution-tech-interview
http://www.eurogamer.net/articles/digitalfoundry-2014-trials-fusion-tech-interview

I also recommend Ka Chen's (Far Cry 4) presentation from GDC 2015 (FC4 has procedural terrain virtual texturing similar to our Trials games / our new tech):
http://twvideo01.ubm-us.net/o1/vault/gdc2015/presentations/Chen_Ka_AdaptiveVirtualTexture.pdf
 
We can't change tile mapping on GPU side. UpdateTileMappingsIndirect doesn't exist in DirectX.
Any reason why direct3d12 still doesn't support this? Didn't find any particular docs about that...

Unfortunately (PC) DirectX doesn't support cross lane operations. All fast CUDA and OpenCL 2.0 (radix) sorting algorithms use cross lane operations
Same here, but the fact you mentioned OpenCL 2.0 instead of 1.2 suggests that some hardware cannot support that, right?

Unfortunately DirectX 12 didn't improve DirectCompute HLSL language
Yes, that's sad .-. With multi-engine support and some hardware supporting async compute works along with graphics, thinks like sort of sophisticated and updated C++AMP usage could become very interesting to accelerate not only graphics and physics..
 
Same here, but the fact you mentioned OpenCL 2.0 instead of 1.2 suggests that some hardware cannot support that, right?
Intel Broadwell and Skylake GPUs and AMD GCN Gen2 and Gen3 support OpenGL 2.0/2.1. NVIDIA has had all the same advanced compute features in CUDA for some years already (since Kepler launch), but they don't have a recent OpenCL driver. Most likely this is because they want to push developers towards CUDA.

The biggest OpenCL 2.0/2.1 features: dynamic parallelism (GPU-side enqueue), cross lane swizzles and shared virtual memory should be supported by most DirectX 12 GPUs. Haswell, GCN Gen1 and Fermi are the only GPUs that could have some problems with these features. If FL 12_0 required these features, the GPU requirement for FL 12_0 would remain identical (Broadwell, Skylake, GCN Gen1, GCN Gen2, Kepler, Maxwell, Maxwell2).

Of course there might be some future mobile GPUs with FL 12_0 support that do not support these compute features. Hard to say, since mobile vendors have been awfully quiet about DX12.
 
Last edited:
Oh one other question, both your and the Assassin's Creed team seemed to settle on 64 triangle clusters for sub-object culling, is this coincidence or did you work together? and is this number special in some way?
 
Largest primitive datatype with bit-test capability + wavefront size.
But since a you don't really look inside a cluster (treated atomically) why would the above be relevant to cluster size?
edit - well I can see why the wavefront size, but not the bit-test.
 
Last edited:
But since a you don't really look inside a cluster (treated atomically) why would the above be relevant to cluster size?
edit - well I can see why the wavefront size, but not the bit-test.

The shadow-mapping does triangle-culling, not knowing the shader I guess there should be a bit-test in there against the cube's side which is in the light's direction. GCN was/is able to load 2 longs at most. It all might be coincidence but it looks sweet-spotty to use 64bits.
 
64 was chosen originally because it is the GCN wave width (and also a multiple of NVIDIA and Intel wave widths). The main reason for us was to enable GCN scalar unit optimizations (offload per instance data to SGPRs and use SALU), but as Ethatron said, 64 also has many other advantages.
 
Last edited:
Were there any other metrics used in choosing the cluster size? Also does the small cluster size produce to many queries per visible object? Or is it a non-issue for you given your "visible scene density"?
 
Well, I can happily say that we've reached the point in HW rendering where I've completely lost touch with the technology and have practically no idea about how stuff works... :D
All I can tell from this thread is that sebbbi is pretty damn clever, but that's no news either. I'll lurk around anyway, but can't contribute anything interesting anymore :)
 
Well, I can happily say that we've reached the point in HW rendering where I've completely lost touch with the technology and have practically no idea about how stuff works... :D
All I can tell from this thread is that sebbbi is pretty damn clever, but that's no news either. I'll lurk around anyway, but can't contribute anything interesting anymore :)

That's the attitude of a quitter.
 
http://www.geforce.com/whats-new/guides/tom-clancys-rainbow-six-siege-graphics-and-performance-guide

Same MSAA trick?
Before we dive into the combinations of options and a look at the modes you likely recognize, let's examine Temporal Filtering, found under "Multisample Anti-Aliasing".

Not to be confused with Temporal Anti-Aliasing, which reduces the flickering and shimmering of anti-aliased edges when the player's camera or view point moves, Temporal Filtering renders the game at half-resolution with 2x MSAA. In other words, a 1920x1080 picture is rendered at 960x540, and 2x MSAA is applied to smooth out the now-rougher edges.

As a result, there are the same number of depth samples as the full-resolution 1920x1080 picture, but only a quarter of the shaded samples, improving performance greatly, but also decreasing image quality. This manifests as a reduction in the quality and visibility of Ambient Occlusion shadowing, increased shader aliasing, decreased lighting and shading fidelity, and a loss of fidelity on smaller game elements, such as leaves, grass, visual effects and minute pieces of geometry.
 
They have to use 4xMSAA to have same amount of depth samples or quarter amount of shading if they use 960x540 as resolution.
 
Nope, they don't have to, they could go further and combine 960x560 image with 2x MSAA trick(1920x560 depth) with an alternating temporal reconstruction filter like in Shadow Falls - "whereby it combines reduced-resolution images from multiple frames to reconstruct a final image"
 
Ah wait... yeah. brain fart. It sounds like the PS3 2xMSAA upscale trick but now with a temporal component.

Guess that explains why the sawtooth edge artefact is minimized in temporally stable conditions.
 
Last edited:
Nope, they don't have to, they could go further and combine 960x560 image with 2x MSAA trick(1920x560 depth) with an alternating temporal reconstruction filter like in Shadow Falls - "whereby it combines reduced-resolution images from multiple frames to reconstruct a final image"
Yup, temporal sampling would explain it.

Certainly fun little tricks, would be great for games which have small amount of surface variation and aimed for VR. (Especially with high MSAA counts.)
 
By the end of the gen we will be playing games rendered natively at VGA resolutions upscaled to 4k with all sorts of infered lighting, MSAA tricks, temporal reprojection, post AA, stochastic transparency, and wahtnot... Many IQ nitpickers tears are yet to be shed.
 
IIUIC Sebbbi's msaa trick is used to interpolate texture UVs and tangent space gbuffer layers from 540p stored in the memory to native 1080p targets used in a compute shader for lighting calculations. Low UV and tangent space resolution doesn't really matter as long as you can perfectly interpolate it to a native resolution without visible quality loss, all further shader texture fetches are done for a native 1080p resolution, so in the end difference between pure 1080p and MSAA trick 1080p could be close to 0 since there is no variable shading rate with MSAA trick (shading and texture fetches are done at 1080p, only UV and tangent frame are stored at 540p since "UV and tangent can be interpolated across triangle surfaces with no quality loss"), though variable shading rate is an interesting area for further reseach with this trick. BTW It seems the trick is only aplicable for a virtual deferred texturing
 
IIUIC Sebbbi's msaa trick is used to interpolate texture UVs and tangent space gbuffer layers from 540p stored in the memory to native 1080p targets used in a compute shader for lighting calculations. Low UV and tangent space resolution doesn't really matter as long as you can perfectly interpolate it to a native resolution without visible quality loss, all further shader texture fetches are done for a native 1080p resolution, so in the end difference between pure 1080p and MSAA trick 1080p could be close to 0 since there is no variable shading rate with MSAA trick (shading and texture fetches are done at 1080p, only UV and tangent frame are stored at 540p since "UV and tangent can be interpolated across triangle surfaces with no quality loss"), though variable shading rate is an interesting area for further reseach with this trick. BTW It seems the trick is only aplicable for a virtual deferred texturing
Yes. Our image quality is very close to perfect. Our MSAA trick does shading (lighting and post processing) at full resolution. It only saves performance on G-buffer overdraw. But for cases of heavy overdraw (such as foliage and trees) the G-buffer savings can be huge (up to 4x theoretical reduction in pixel shader invocations and back buffer bandwidth).
 
Back
Top