Game development presentations - a useful reference

Doesn't seem like anyone has posted this yet.

In depth look into Deus Ex's new engine.
Lots of talk about GPU culling, deferred texturing and other stuff I don't completely understand. :p

"Deferred+: Next-Gen Culling and Rendering for Dawn Engine"

https://community.eidosmontreal.com/blogs/next-gen-dawn-engine


Performance seems great though. :mrgreen:

* Not being used by Deus Ex Mankind Divided actually..
 
Last edited by a moderator:
I'm guessing we'll see that in Mass Effect? Excited :D

Edit: "unannounced bioware title"

So new DA or something new.
 
Last edited:
Doesn't seem like anyone has posted this yet.

In depth look into Deus Ex's new engine.
Lots of talk about GPU culling, deferred texturing and other stuff I don't completely understand. :p

"Deferred+: Next-Gen Culling and Rendering for Dawn Engine"

https://community.eidosmontreal.com/blogs/next-gen-dawn-engine


Performance seems great though. :mrgreen:

* Not being used by Deus Ex Mankind Divided actually..

Pretty cool, looks like it's all being enabled by Dx12 with a relatively large performance boost. Best thing is that it's IHV agnostic. I wonder if any performance benefits will translate down to the Dx11 version of the Dawn Engine (assuming there's a Dx11 version).

Also nice that they noted that in a realistic game environment the performance benefits should be even larger. Oftentimes, performance gains in technology demos end up being lower when implemented into a game.

I also wonder if the engine is going to use Dx12 as a base or if it's still going to use Dx11 as a base and just have another rendering path for Dx12.

Regards,
SB
 
Last edited:
2 videos from an Insomniac Games event last week




Insomniac did an event call core dump :) last week about game development
 
Visibility Buffer seems like a beautiful technique. And the beginning of a new era of multi-frame cached geometry techniques, etc.

There's a slide that asks "Why didn't we implement this earlier?" It's kinda depressing it's taken so long to get to this point. Finally we're getting past the stupidity of brute force rendering.

There is something wrong with MSAA resolve on GCN. Seriously, WTF?
 
Visibility Buffer seems like a beautiful technique. And the beginning of a new era of multi-frame cached geometry techniques, etc.

There's a slide that asks "Why didn't we implement this earlier?" It's kinda depressing it's taken so long to get to this point. Finally we're getting past the stupidity of brute force rendering.

There is something wrong with MSAA resolve on GCN. Seriously, WTF?
Several developers have been experimenting with this technique for quite some time already. We call it triangle id buffer. Intel's article was released last year. We had independently came up with this technique and compared it to our deferred texturing technique. In the end we chose deferred texturing (with MSAA trick) instead. It was better fit for our needs (performance). It also scales very well to high resolutions.

Thread comparing these techniques:
https://forum.beyond3d.com/threads/modern-textureless-deferred-rendering-techniques.57611/

GCN 3 and newer can directly read MSAA samples (for example in your lighting compute shader) without a decompression step. On older GCN the driver performs a decompress step before RT is bound as texture. This is slow. But on consoles you can directly read the compressed data and decompress manually in your lighting compute shader.
 
Last edited:
Xbox One is faster than 380 in MSAA resolve, so nothing is wrong with GCN specifically.
~4x slower on PC versus NVidia using a level-playing-field API. GCN/PC driver in this case is getting in the way of performance. Without a hack on PC, NVidia performance is in a different class. So GCN is failing on the platform where MSAA is more likely to be used. It's an anti-pattern for PC performance.

It's as if, after Xenos where developers learnt to directly access the MSAA format for their own resolves, AMD decided to keep that concept in GCN knowing that on console the hack could be formalised. And then ignore the consequences for performance on PC.
 
Several developers have been experimenting with this technique for quite some time already. We call it triangle id buffer. Intel's article was released last year. We had independently came up with this technique and compared it to our deferred texturing technique. In the end we chose deferred texturing (with MSAA trick) instead. It was better fit for our needs (performance). It also scales very well to high resolutions.

Thread comparing these techniques:
https://forum.beyond3d.com/threads/modern-textureless-deferred-rendering-techniques.57611/
This presentation may not be fair in it's performance comparison (against deferred) and it doesn't dwell on a technical comparison with the Intel technique. In short, it's impossible to tell how it compares with the analysis you did in that thread (well, I don't feel up to making that comparison). It seems like a further evolution though: a visibility buffer based upon culled triangles for multiple views with a lifetime of multiple frames. That's two non-trivial multipliers for rendering efficiency right there. Sure there's rough edges (moving camera, dynamic geometry). I can't tell how close to "game engine ready" this is. I'll have to defer to the developers on that!

When the code is released perhaps we'll get richer comparisons.
 
It seems like a further evolution though: a visibility buffer based upon culled triangles for multiple views with a lifetime of multiple frames. That's two non-trivial multipliers for rendering efficiency right there.
The main point of our SIGGRAPH paper was our fine grained GPU cluster culling. We also do multiview culling, use last frame results as occlusion hint and have similar cluster backface culling as described in this paper. Deferred texturing suited the GPU-driven pipeline really well. Just like triangle id buffer, deferred texturing also supports variable rate shading and texture space shading. This deferred texturing technique is actually fully dependent on virtual texturing, making texture space shading/caching techniques a perfect fit for it. We used (virtual) texture space decaling and material blending already in Trials Evolution on Xbox 360 (http://www.eurogamer.net/articles/digitalfoundry-trials-evolution-tech-interview). These techniques are certainly ready for production use on consoles, but API limitations, such as limited support for multidraw make them less viable on PC (*).

(*) DirectX 12 needs Windows 10 and doesn't support Radeon 5000/6000 series or Geforce Fermi (no drivers despite Nvidia's promises). Vulkan has similar GPU limitations. Additionally Intel doesn't yet have any Windows Vulkan drivers for consumers (only developer beta) and their commitment to Vulkan is unclear. See here: https://communities.intel.com/thread/104380?start=30&tstart=0.

Quote (Intel representative): The current Plan Of Record is that Intel® is not supporting Vulkan on Windows drivers. The drivers that were made available on Developer.com are intended for Vulkan developers.
Sure there's rough edges (moving camera, dynamic geometry). I can't tell how close to "game engine ready" this is. I'll have to defer to the developers on that!
We have noticed no issues with moving camera or dynamic geometry regarding to GPU-driven rendering. GPU-driven rendering handles fast moving camera better than CPU culling, as the culling information is up-to-date (current frame depth buffer). CPU-based occlusion culling techniques used either last frame data (reprojection and flickering issues) or software rasterized low polygon proxies (another bag of issues and significantly worse culling performance). GPU-driven culling has a huge advantage in shadow map rendering, as the scene depth buffer can be scanned to identify receiving surfaces precisely. This brings big gains over over commonly used cascaded shadow mapping technique. 3x+ gains (1/3 render time) are possible in complex scenes.
It's as if, after Xenos where developers learnt to directly access the MSAA format for their own resolves, AMD decided to keep that concept in GCN knowing that on console the hack could be formalised. And then ignore the consequences for performance on PC.
Their PC compiler could generate shader instructions to load samples directly from MSAA (CMASK) target without decompression. This should be a win in most cases. They already generate complex sequences of ALU instructions for vertex data interpolation, cubemap fetch, register indexing (older GCN cards don't support reg indexing), wave incoherent resource load/store (DX12 bindless), etc. I fail to see why Texture2DMS::Load couldn't be handled similarly by the compiler. But this is no longer an issue, since Polaris improved DCC handles MSAA loads natively. The primitive discard accelerator also removes the bottleneck of subpixel triangles (not hitting any sampling points) increasing MSAA performance. If you want to see current AMD PC results, test against Polaris.
 
Last edited:
Several developers have been experimenting with this technique for quite some time already. We call it triangle id buffer. Intel's article was released last year. We had independently came up with this technique and compared it to our deferred texturing technique. In the end we chose deferred texturing (with MSAA trick) instead. It was better fit for our needs (performance). It also scales very well to high resolutions.

Can any of those techniches speed up motion vector buffer construction? ND said to be running all skinning and vertex displacement twice per frame to derive extra accurate MV needed for temporal reconstruction AA. They are effectively computing the same stuff two times (current frame and the next) because they aren't cashing it. Can triangle ID or other similar tech do anything to improve that?
 
Can any of those techniches speed up motion vector buffer construction? ND said to be running all skinning and vertex displacement twice per frame to derive extra accurate MV needed for temporal reconstruction AA. They are effectively computing the same stuff two times (current frame and the next) because they aren't cashing it. Can triangle ID or other similar tech do anything to improve that?
Triangle id buffer only needs simple position transform for all geometry. All other vertex attributes are accessed, interpolated and transformed only for visible triangles (however trivial implementations replicate the math once per pixel). So it increases the base cost (of simple scenes), but makes scenes with lots of tiny triangles and/or overdraw faster (less fluctuation).

You can reduce work by identifying duplicates in each thread group and sharing data through LDS. Or with SM 6.0 you also have cross lane operations to swap data across lanes. Quad swizzle is guaranteed on all DX12 hardware and this can already drastically reduce duplicate work.

Alternatively you could pre-transform (every frame) and store into memory all potentially visible triangles. For this to be efficient you need some sort of occlusion culling. GPU-driven cluster culling would be perfect for this purpose. The one I presented at SIGGRAPH 2015 identies visible triangles at 64 triangle granularity (clusters). Then you could run a compute shader that goes through the visible clusters and transforms them into memory. Now the rendering pass doesn't need to transform anything. So there no duplicate per pixel work for big triangles. This also makes skinning and complex animation easier to handle (in an uniform way).
 
Last edited:
Back
Top