Game development presentations - a useful reference

dogen · Jul 25, 2016

Doesn't seem like anyone has posted this yet.

In depth look into Deus Ex's new engine.
Lots of talk about GPU culling, deferred texturing and other stuff I don't completely understand.

"Deferred+: Next-Gen Culling and Rendering for Dawn Engine"

https://community.eidosmontreal.com/blogs/next-gen-dawn-engine

Performance seems great though. :mrgreen:

* Not being used by Deus Ex Mankind Divided actually..

chris1515 · Jul 27, 2016

http://blog.selfshadow.com/publications/s2016-shading-course/

Many slides from SIGGRAPH from different presentation.

Funny DICE use the same tech than Horizon Dawn for volumetric cloud with a few improvement.

Clukos · Jul 27, 2016

I'm guessing we'll see that in Mass Effect? Excited

Edit: "unannounced bioware title"

So new DA or something new.

Silent_Buddha · Jul 27, 2016

dogen said:
Doesn't seem like anyone has posted this yet.

In depth look into Deus Ex's new engine.
Lots of talk about GPU culling, deferred texturing and other stuff I don't completely understand.

"Deferred+: Next-Gen Culling and Rendering for Dawn Engine"

https://community.eidosmontreal.com/blogs/next-gen-dawn-engine

Performance seems great though.

* Not being used by Deus Ex Mankind Divided actually..

Pretty cool, looks like it's all being enabled by Dx12 with a relatively large performance boost. Best thing is that it's IHV agnostic. I wonder if any performance benefits will translate down to the Dx11 version of the Dawn Engine (assuming there's a Dx11 version).

Also nice that they noted that in a realistic game environment the performance benefits should be even larger. Oftentimes, performance gains in technology demos end up being lower when implemented into a game.

I also wonder if the engine is going to use Dx12 as a base or if it's still going to use Dx11 as a base and just have another rendering path for Dx12.

Regards,
SB

chris1515 · Aug 12, 2016

http://c0de517e.blogspot.fr/2016/08/the-real-time-rendering-continuum.html?m=1

Great article and easy to understand by people with interest in 3D but notre game dev or working in offline rendering.

Another old but Good article :
http://c0de517e.blogspot.fr/2008/04/gpu-part-1.html?m=1

Good explanation by a game dev about the different CPU level of cache:

https://fgiesen.wordpress.com/2016/08/07/why-do-cpus-have-multiple-cache-levels/

chris1515 · Aug 12, 2016

http://oxidegames.com/2016/03/19/ob...-film-rendering-2-decades-later-in-real-time/

GDC 2016 presentation about object space lightining, rendering technique comparable tout REYES

Maybe someone posted it before but first time I see the presentation. Sorry if it was posted before...

chris1515 · Aug 12, 2016

2 videos from an Insomniac Games event last week

Insomniac did an event call core dump

last week about game development

chris1515 · Aug 12, 2016

Another one from this event

jlippo · Aug 23, 2016

"GDCE 2016 - The filtered and culled Visibility Buffer"

Jawed · Aug 23, 2016

Visibility Buffer seems like a beautiful technique. And the beginning of a new era of multi-frame cached geometry techniques, etc.

There's a slide that asks "Why didn't we implement this earlier?" It's kinda depressing it's taken so long to get to this point. Finally we're getting past the stupidity of brute force rendering.

There is something wrong with MSAA resolve on GCN. Seriously, WTF?

sebbbi · Aug 23, 2016

Jawed said:
Visibility Buffer seems like a beautiful technique. And the beginning of a new era of multi-frame cached geometry techniques, etc.

There's a slide that asks "Why didn't we implement this earlier?" It's kinda depressing it's taken so long to get to this point. Finally we're getting past the stupidity of brute force rendering.

There is something wrong with MSAA resolve on GCN. Seriously, WTF?

Several developers have been experimenting with this technique for quite some time already. We call it triangle id buffer. Intel's article was released last year. We had independently came up with this technique and compared it to our deferred texturing technique. In the end we chose deferred texturing (with MSAA trick) instead. It was better fit for our needs (performance). It also scales very well to high resolutions.

Thread comparing these techniques:
https://forum.beyond3d.com/threads/modern-textureless-deferred-rendering-techniques.57611/

GCN 3 and newer can directly read MSAA samples (for example in your lighting compute shader) without a decompression step. On older GCN the driver performs a decompress step before RT is bound as texture. This is slow. But on consoles you can directly read the compressed data and decompress manually in your lighting compute shader.

Rikimaru · Aug 23, 2016

Jawed said:
There is something wrong with MSAA resolve on GCN. Seriously, WTF?

Xbox One is faster than 380 in MSAA resolve, so nothing is wrong with GCN specifically.

Jawed · Aug 23, 2016

Rikimaru said:
Xbox One is faster than 380 in MSAA resolve, so nothing is wrong with GCN specifically.

~4x slower on PC versus NVidia using a level-playing-field API. GCN/PC driver in this case is getting in the way of performance. Without a hack on PC, NVidia performance is in a different class. So GCN is failing on the platform where MSAA is more likely to be used. It's an anti-pattern for PC performance.

It's as if, after Xenos where developers learnt to directly access the MSAA format for their own resolves, AMD decided to keep that concept in GCN knowing that on console the hack could be formalised. And then ignore the consequences for performance on PC.

Jawed · Aug 23, 2016

sebbbi said:
Several developers have been experimenting with this technique for quite some time already. We call it triangle id buffer. Intel's article was released last year. We had independently came up with this technique and compared it to our deferred texturing technique. In the end we chose deferred texturing (with MSAA trick) instead. It was better fit for our needs (performance). It also scales very well to high resolutions.

Thread comparing these techniques:
https://forum.beyond3d.com/threads/modern-textureless-deferred-rendering-techniques.57611/

This presentation may not be fair in it's performance comparison (against deferred) and it doesn't dwell on a technical comparison with the Intel technique. In short, it's impossible to tell how it compares with the analysis you did in that thread (well, I don't feel up to making that comparison). It seems like a further evolution though: a visibility buffer based upon culled triangles for multiple views with a lifetime of multiple frames. That's two non-trivial multipliers for rendering efficiency right there. Sure there's rough edges (moving camera, dynamic geometry). I can't tell how close to "game engine ready" this is. I'll have to defer to the developers on that!

When the code is released perhaps we'll get richer comparisons.

sebbbi · Aug 24, 2016

Jawed said:
It seems like a further evolution though: a visibility buffer based upon culled triangles for multiple views with a lifetime of multiple frames. That's two non-trivial multipliers for rendering efficiency right there.

The main point of our SIGGRAPH paper was our fine grained GPU cluster culling. We also do multiview culling, use last frame results as occlusion hint and have similar cluster backface culling as described in this paper. Deferred texturing suited the GPU-driven pipeline really well. Just like triangle id buffer, deferred texturing also supports variable rate shading and texture space shading. This deferred texturing technique is actually fully dependent on virtual texturing, making texture space shading/caching techniques a perfect fit for it. We used (virtual) texture space decaling and material blending already in Trials Evolution on Xbox 360 (http://www.eurogamer.net/articles/digitalfoundry-trials-evolution-tech-interview). These techniques are certainly ready for production use on consoles, but API limitations, such as limited support for multidraw make them less viable on PC (*).

(*) DirectX 12 needs Windows 10 and doesn't support Radeon 5000/6000 series or Geforce Fermi (no drivers despite Nvidia's promises). Vulkan has similar GPU limitations. Additionally Intel doesn't yet have any Windows Vulkan drivers for consumers (only developer beta) and their commitment to Vulkan is unclear. See here: https://communities.intel.com/thread/104380?start=30&tstart=0.

Quote (Intel representative): The current Plan Of Record is that Intel® is not supporting Vulkan on Windows drivers. The drivers that were made available on Developer.com are intended for Vulkan developers.

Jawed said:
Sure there's rough edges (moving camera, dynamic geometry). I can't tell how close to "game engine ready" this is. I'll have to defer to the developers on that!

We have noticed no issues with moving camera or dynamic geometry regarding to GPU-driven rendering. GPU-driven rendering handles fast moving camera better than CPU culling, as the culling information is up-to-date (current frame depth buffer). CPU-based occlusion culling techniques used either last frame data (reprojection and flickering issues) or software rasterized low polygon proxies (another bag of issues and significantly worse culling performance). GPU-driven culling has a huge advantage in shadow map rendering, as the scene depth buffer can be scanned to identify receiving surfaces precisely. This brings big gains over over commonly used cascaded shadow mapping technique. 3x+ gains (1/3 render time) are possible in complex scenes.

Jawed said:
It's as if, after Xenos where developers learnt to directly access the MSAA format for their own resolves, AMD decided to keep that concept in GCN knowing that on console the hack could be formalised. And then ignore the consequences for performance on PC.

Their PC compiler could generate shader instructions to load samples directly from MSAA (CMASK) target without decompression. This should be a win in most cases. They already generate complex sequences of ALU instructions for vertex data interpolation, cubemap fetch, register indexing (older GCN cards don't support reg indexing), wave incoherent resource load/store (DX12 bindless), etc. I fail to see why Texture2DMS::Load couldn't be handled similarly by the compiler. But this is no longer an issue, since Polaris improved DCC handles MSAA loads natively. The primitive discard accelerator also removes the bottleneck of subpixel triangles (not hitting any sampling points) increasing MSAA performance. If you want to see current AMD PC results, test against Polaris.

chris1515 · Aug 25, 2016

http://gdcvault.com/browse/gdc-europe-16

All GDC Europe presentation are now online

milk · Aug 25, 2016

sebbbi said:
Several developers have been experimenting with this technique for quite some time already. We call it triangle id buffer. Intel's article was released last year. We had independently came up with this technique and compared it to our deferred texturing technique. In the end we chose deferred texturing (with MSAA trick) instead. It was better fit for our needs (performance). It also scales very well to high resolutions.

Can any of those techniches speed up motion vector buffer construction? ND said to be running all skinning and vertex displacement twice per frame to derive extra accurate MV needed for temporal reconstruction AA. They are effectively computing the same stuff two times (current frame and the next) because they aren't cashing it. Can triangle ID or other similar tech do anything to improve that?

sebbbi · Aug 25, 2016

milk said:
Can any of those techniches speed up motion vector buffer construction? ND said to be running all skinning and vertex displacement twice per frame to derive extra accurate MV needed for temporal reconstruction AA. They are effectively computing the same stuff two times (current frame and the next) because they aren't cashing it. Can triangle ID or other similar tech do anything to improve that?

Triangle id buffer only needs simple position transform for all geometry. All other vertex attributes are accessed, interpolated and transformed only for visible triangles (however trivial implementations replicate the math once per pixel). So it increases the base cost (of simple scenes), but makes scenes with lots of tiny triangles and/or overdraw faster (less fluctuation).

You can reduce work by identifying duplicates in each thread group and sharing data through LDS. Or with SM 6.0 you also have cross lane operations to swap data across lanes. Quad swizzle is guaranteed on all DX12 hardware and this can already drastically reduce duplicate work.

Alternatively you could pre-transform (every frame) and store into memory all potentially visible triangles. For this to be efficient you need some sort of occlusion culling. GPU-driven cluster culling would be perfect for this purpose. The one I presented at SIGGRAPH 2015 identies visible triangles at 64 triangle granularity (clusters). Then you could run a compute shader that goes through the visible clusters and transforms them into memory. Now the rendering pass doesn't need to transform anything. So there no duplicate per pixel work for big triangles. This also makes skinning and complex animation easier to handle (in an uniform way).

Rodéric · Aug 26, 2016

Sebbbi's implementing PowerVR hardware in software... Just give up on IMR and use the real thing !

jlippo · Sep 7, 2016

Modified Filtered Importance Sampling for Virtual Spherical Gaussian Lights
(And other SquareEnix presentations.)

Game development presentations - a useful reference

dogen

chris1515

Clukos

Bloodborne 2 when?

Silent_Buddha

chris1515

chris1515

chris1515

chris1515

jlippo

Jawed

sebbbi

Rikimaru

Jawed

Jawed

sebbbi

chris1515

milk

Like Verified

sebbbi

Rodéric

a.k.a. Ingenu

jlippo

Similar threads