Deferred Rendering on PS3 (Was KZ2 Tech Discussion)

Nano · Jan 22, 2009

Terarrim said:
So in the case of all six SPE's working together its highly unlikely that they do a straight SPE>GPU transfer. Coudn't they set up a program to say do direct SPE>GPU transfer using que flags for urgancy (or some other clever bit of programing).

i.e. each SPE does say 4 graphical tasks sends then to GPU flag with importance of each task for processing order something like how networks use flags for data proratisation across networks?

Unless I'm mistaken. Sounds like you're pretty much describing pipelining

That's the only way you're really going to get so many operations working concurrently anyway. As opposed to sending one instruction after another, they are all in some stage of execution at the same time. Future applications/engines will hope to be more clever with the pipelining process to get as much performance as possible out of the set hardware; be it revising the orders in which instructions are executed, or introducing completely different implementations where the instructions *may* be different altogether.

Ofc, as you can imagine, nothing is straightforward about sending all these tasks back and forth so there's always room for improvement.

betan · Jan 23, 2009

Regarding post processing tasks, I expect the overhead of syncing multiple SPU threads to be insignificant compared to syncing with GPU, especially considering the asymmetric CPU bandwidth.

Betanumerical · Jan 23, 2009

TimothyFarrar said:
Code:

CPU TIME -------- SPU Sync ......... 0.06% ... ... ... GPU Sync ......... 37.99% ----------

betan said:
Regarding post processing tasks, I expect the overhead of syncing multiple SPU threads to be insignificant compared to syncing with GPU, especially considering the asymmetric CPU bandwidth.

Seems that you are correct. This is what you are talking about right. :?:

betan · Jan 23, 2009

Seems that you are correct. This is what you are talking about right.

Yes it's, but I'm not sure what that profiler is measuring. For all I know, RSX sync may be done by infinite non-blocking loop to check data sent by RSX while SPU sync may be done by blocking messaging.

TimothyFarrar · Jan 23, 2009

betan said:
Yes it's, but I'm not sure what that profiler is measuring. For all I know, RSX sync may be done by infinite non-blocking loop to check data sent by RSX while SPU sync may be done by blocking messaging.

Yeah, I wouldn't read too much into those CPU numbers. My guess is that everything but GPU Sync is covered in the total, and unlike the SPU numbers, that "Total Time" ranges from 0-100% (sum of individual items / 2 hyperthreads), so the sum of the individual times can be above 100%. GPU Sync might simply be the time required for all GPU commands to complete CPU side, but the CPU is highly likely doing something else instead of waiting on the GPU during that time.

Also keep in mind that it was quite a challenge just to read anything in that low resolution video, so, assume numbers are wrong and perhaps from different frames...

vanquish · Jan 26, 2009

Rangers said:
I know I REALLY shouldnt go there...
Mod : Then don't
*snip*

Although I also think, with seemingly amazing culling and the deferred rendering, KZ2 is pushing boundaries to the limit in more ways than just post processing on SPU's.

I agree with the second bit though, didn't extensive culling (thanks to the scripted camera) make God of War the visual showcase it is on PS2?

patsu · Jan 26, 2009

Culling is used extensively in quite a few games. It's not the only ingredient for KZ2's prettiness.

I'd say it's not what they do. It's how they did it and integrated "everything" (art and technology) nicely together that's impressive -- especially when you hear people complaining about insufficient memory.

I am also curious how much resources KZ2's forward renderer takes (amongst all the deferred rendering tasks). And whether their workflow is any different from the usual.

expendable · Jan 31, 2009

Can we get a polygon estimate by someone courageous on this level or certain objects? http://www.gametrailers.com/player/44970.html

The yellow rails as clearly visible from 20 seconds and onward look damn near perfectly round.

[MOD: No one deleted your post. Posts from new members sometimes have to wait for moderator approval otherwise the boards would be inundated with spam]

Remij · Feb 1, 2009

expendable said:
Can we get a polygon estimate by someone courageous on this level or certain objects? http://www.gametrailers.com/player/44970.html

The yellow rails as clearly visible from 20 seconds and onward look damn near perfectly round.

[MOD: No one deleted your post. Posts from new members sometimes have to wait for moderator approval otherwise the boards would be inundated with spam]

KZ2 is a visual masterpiece, but has a lot of smoke and mirrors!

Effects are really what make games look good nowadays.

*merged posts*

I dunno if that pic is going to show up, so here's the link. sorry guys.

http://img89.imageshack.us/my.php?image=kz22om9.png

Brad Grenz · Feb 1, 2009

Smoke and mirrors is the name of the game for all rasterization. Triangles and texture maps have always been the quick and dirty way of getting 3D on the screen. You judge and engine by how well it hides its shortcuts through misdirection. It's all an illusion, after all.

expendable · Feb 1, 2009

Remij said:
I dunno if that pic is going to show up, so here's the link. sorry guys.

http://img89.imageshack.us/my.php?image=kz22om9.png

A-ha. Thanks. That's smoke and mirrors indeed.

joker454 · Feb 1, 2009

Brad Grenz said:
Smoke and mirrors is the name of the game for all rasterization. Triangles and texture maps have always been the quick and dirty way of getting 3D on the screen. You judge and engine by how well it hides its shortcuts through misdirection. It's all an illusion, after all.

That's a very good point. It's interesting to me that the KZ2 crew sited Blade Runner as one of their inspirations for the look. I was just watching a behind the scenes dvd on Blade Runner and for the visual look they mention the only way they were able to get it to look so convincing was the combination of filming at night, the constant rain, and the constant fog. Without those none of it looked convincing. KZ2 is similar in that the look is very carefully chosen to maximize the illusion. Keeping everything dark, gritty, all that crap floating around all the time, etc, it's all carefully chosen to maximize the illusion.

Arnold Beckenbauer · Feb 25, 2009

http://www.digitalfoundry.org/blog/?p=481

Coming up tomorrow: three more campaigns are analysed. Plus, deferred rendering. GTA IV uses it. Killzone 2 uses it. But what is it and how is it applied in this game?

GTA IV's renderer is a Deferred Renderer (with G-Buffer and other DR's stuff)?

AlNom · Feb 25, 2009

Arnold Beckenbauer said:
GTA IV's renderer is a Deferred Renderer (with G-Buffer and other DR's stuff)?

It's an early implementation of Engel's Light Pre-pass renderer, which is different from the Killzone 2 model. In the general sense, they are both deferring the lighting aspect, but the implementations differ quite a bit.

the ignoramus · Feb 25, 2009

This article says that LittleBigPlanet also used deferred rendering. Was this well known? (inFamous is another Sony title that uses it)
Link

....Something else that further encouraged its (deferred rendering's) adoption has been the work carried out by Sony's internal R&D team to standardize deferred rendering as something that can be used on Playstation 3.
As well as being widely distributed within Sony studios, the results of this labour have found their way into some high profile multi-format games. Collaboration with Rockstar resulted in its use in the RAGE technology that powers GTA IV, for example. Media Molecule's LittleBigPlanet and Guerilla's Killzone 2 are two Sony backed games that make the most out of the additional control and sophistication it enables in terms of game lighting.

Jan Bart van Beek, art and animation director at Guerilla, on deferred rendering advantages:

....because you take all the lighting calculations out of your shaders, it makes them a lot less complicated. This means your artists can create the shaders not the programmers. We used Maya's shading editor to make out game shaders. And because the cost of these shaders is low, you can create specific looks for specific objects instead of having to use general templates

Tim Sweeney, architect of the Unreal engine highlights further visual challenges of using deferred rendering. "it's faster for large numbers of light and shadows, but the drawbacks are increased video memory usage, and artistic limitations as you force all objects to be rendered with the same material
model." He says UE3 has an extremely flexible and artist extensible material system, so we didnt want to constrain this unecessarily.
Another issue is anti-aliasing. "Anti-aliasing is a key to rendering quality in Gears of War," Sweeney explains. "If you look closely, you'll see that all static and dynamic lighting is anti-aliased with MSAA, so moving to a pure deferred rendering approach would be a step backwards."

Significantly though, UE3 does use some deferred elements, re-using z and colour buffers and techniques that would
be otherwise impractical such as velocity-buffered motion blur.

(Develop- Issue 91-Feb 2009)

abcgamer · Feb 25, 2009

patsu said:
I have some time tonight after meeting some milestone. Found this on SCEA's R&D site:
http://research.scea.com/ps3_deferred_shading.pdf

I can't remember if it's old but there are some numbers.

EDIT:
Guerilla Games's presentation in Develop Conference 2007 talked about their implementation: http://www.develop-conference.com/d.../vwsection/Deferred Rendering in Killzone.pdf

Insomiac also has very interesting slides on their R&D page: http://www.insomniacgames.com/tech/techpage.php
You should be able to find things like memory budget (for different subsystems), schedule for physics calculations, etc. in some of the papers.

I wish I have time to assimilate them.

Thanks for the links Pastu

:smile:

I will have a go at those pdf later. I do not like dark games too much but KZ2 nice dynamic lightning and special effect, post processing effect makes this game stand out of any other game I have seen. Of course Crysis is most advanced game but I think KZ2 is at the top of the hill ATM when it comes to consoles

Arnold Beckenbauer · Feb 25, 2009

AlStrong said:
It's an early implementation of Engel's Light Pre-pass renderer, which is different from the Killzone 2 model. In the general sense, they are both deferring the lighting aspect, but the implementations differ quite a bit.

Deferred Rendering is defined by creating a G-Buffer, isn't it? So no G-Buffer = no Deferred Rendering. :?:

AlNom · Feb 25, 2009

Arnold Beckenbauer said:
Deferred Rendering is defined by creating a G-Buffer, isn't it? So no G-Buffer = no Deferred Rendering.

There's something like it, yes... have a look. There are a few more posts regarding the subject later on in the blog.

http://diaryofagraphicsprogrammer.blogspot.com/2008/03/light-pre-pass-renderer.html

sebbbi · Feb 25, 2009

....Something else that further encouraged its (deferred rendering's) adoption has been the work carried out by Sony's internal R&D team to standardize deferred rendering as something that can be used on Playstation 3.
As well as being widely distributed within Sony studios, the results of this labour have found their way into some high profile multi-format games. Collaboration with Rockstar resulted in its use in the RAGE technology that powers GTA IV, for example. Media Molecule's LittleBigPlanet and Guerilla's Killzone 2 are two Sony backed games that make the most out of the additional control and sophistication it enables in terms of game lighting.

Arnold Beckenbauer said:
Deferred Rendering is defined by creating a G-Buffer, isn't it? So no G-Buffer = no Deferred Rendering.

Light indexed deferred renderers do not have g-buffers. You have light index buffers instead.

We were evaluating a lot of different deferred shading (and forward shading) methods for our new Xbox 360 game, and LIDR was the best method when used in combination with stencil shadows (our older PC SM2.0/3.0 engine used stencil shadows). However LIDR is not that well suited for shadow mapped lights, and stencil shadows are no longer that attractive shadowing method (high geometry complexity and depth complexity cause huge overdraw and fill rate cost... and non-geometry occluders cannot be supported).

So in the end we were bounding back and forth between forward rendering and standard deferred shading. Our deferred renderer used KZ2/Uncharted style screen split into quads and rendering multiple lights per at once, however we added depth bounds test to the lights to speed up the performance (downscaled depth buffer storing min and max depth of the quad).

According to our testing, deferred shading seems to be the best fit for performance this generation when antialiasing is not used, and the performance advantage should increase in the future. In our testing, deferred shading had slightly better performance even when all local lights were disabled: only a single sunlight with 3 split PSSM shadow map with ESM filtering (for soft shadows) and gray scale alpha masks (for transparent occluders).

The slight deferred shading performance advantage was mainly explained by reduced shader overdraw:
1. Hi-depth culling is not pixel precise: overdraw near all depth boundaries (fences etc have 2 x overdraw, and 3 x for two fences, etc)
2. Hi-depth culling bit depth is limited: 2x overdraw on surfaces near each other (signs, posters on walls, etc)
3. Object inner overdraw: Object polygons are only presorted (tipsify) and it's not perfect on all view angles. On worst cases this can cause 2-3x overdraw on a single object.
The average pixel overdraw was around 1.4x for our average scene (we depth sorted objects by center points), and around 4x for the most pathological cases (looking a wall with a large poster though a fence). Deferred sharing guaranteed that we only process each light once per pixel. The cost of the g-buffer rendering with only one light was almost equal to the cost saved by the average 1.4x overdraw (lighting shader is very expensive compared to the shader that renders the g-buffer). And with multiple local lights deferred shader got considerably faster.

Other reasons for deferred renderer performance increases (over forward renderer) was the much reduced need for state changes and light processing during the scene rendering (this mainly reduced CPU load) and we could do more precompiled static shader versions instead of relying on static brancing on shaders (a small perf hit for the shaders). With deferred sharing the lighting and g-buffer rendering shaders can be completely separated, and this dramatically reduces the need of different shader combinations (n+m instead of n*m).

I expect many developers in the future reach similar conclusions. However deferred shading has it's weaknesses. The extra antialiasing cost is the main issue. The extra memory usage is not that big deal, and according to our testing deferred shading only consumes very little extra bandwidth (the less overdraw means less sampled shadow map pixels, etc...).

Silent_Buddha · Feb 25, 2009

sebbbi said:
According to our testing, deferred shading seems to be the best fit for performance this generation when antialiasing is not used, and the performance advantage should increase in the future.

/cry... /sob...

No AA? /sigh.

I can't count how many games visually just go into the toilet (for me) when I see all the jaggies crawling around. GT5 could have looked so good for example. But it looks like utter poo in motion with horrible crawling aliasing in what might otherwise have been a fantasic graphics showcase.

Here's to hoping next next gen consoles get it right and enforce the use of AA. /sigh...

Regards,
SB

Deferred Rendering on PS3 (Was KZ2 Tech Discussion)

Nano

betan

Betanumerical

betan

TimothyFarrar

vanquish

patsu

expendable

Remij

Brad Grenz

Philosopher & Poet

expendable

joker454

Arnold Beckenbauer

AlNom

Moderator

the ignoramus

abcgamer

Arnold Beckenbauer

AlNom

Moderator

sebbbi

Silent_Buddha