Deferred Rendering on PS3 (Was KZ2 Tech Discussion)

Status
Not open for further replies.
I don't believe so.

I believe he is still at a multiplat development studio, but he has moved over to the PS3 side, I think due to the lack of experienced programmers. I wish I remember what thread that was mentioned in.

I've found his post to be easier to read and understand. He often post with information that is not widely known (at least to me) as well.
 
I assume that you're still primarily developing on the PS3. I don't recall you mentioning cpu skinning in the VMX thread, are you devs using them more effectively now.

Yeppers still PS3, haven't touched a 360 kit in about a year. You definitely want to cpu skin any chance you get on PS3, on 360 there is usually no need to bother. I mentioned it just to make a point that it's possible to do, and that there are some special case scenarios that you would do it on 360 (like on instanced crowd), but you usually just let the unified shaders shred through the work.
 
joker454 said:
If you are lucky and can reuse a buffer from elsewhere then cool, if not then that's more memory eaten.
I don't see how what luck has to do with it. If this is cross-platform render code, you already have to time-share all render-buffers on the 360, hence you can time-share them on any other platform.
 
Question on the subject of deferred renderers. Could the KZ2 engine make use of magatextures? Is there any reason that this could pose some challenges (memory, bandwidth)? Because out of anything that could be added to this engine, I think that megatexturing would be a noticeable upgrade.
 
I don't see how what luck has to do with it. If this is cross-platform render code, you already have to time-share all render-buffers on the 360, hence you can time-share them on any other platform.

On 360 you render particles right into edram along with the opaque pass, which then all gets resolved out to the main 1280x720 scene in main ram. There is no extra 1/4 sized buffer necessary. There are some games that have extreme overdraw that may need a 1/4 size buffer even on 360, but for most games the speed of edram is plenty fast and alpha can be treated as free, so no 1/4 size buffer necessary at all.

On PS3 you render your opaque pass into a 1280x720 buffer in main ram, but you render particles separately into a 640x360 buffer elsewhere in main ram because you will almost always run out of memory bandwidth on the transparency pass. This is a separate buffer than the main scene, and one not needed in the 360 render path. Eventually you blend this 1/4 buffer back with the main scene on PS3, but in the mean time it sits around eating additional memory. Depending on your post process steps it may or may not be easy to reuse an existing 1/4 sized buffer (assuming you even have one).
 
On 360 you render particles right into edram along with the opaque pass, which then all gets resolved out to the main 1280x720 scene in main ram. There is no extra 1/4 sized buffer necessary. There are some games that have extreme overdraw that may need a 1/4 size buffer even on 360, but for most games the speed of edram is plenty fast and alpha can be treated as free, so no 1/4 size buffer necessary at all.

On PS3 you render your opaque pass into a 1280x720 buffer in main ram, but you render particles separately into a 640x360 buffer elsewhere in main ram because you will almost always run out of memory bandwidth on the transparency pass. This is a separate buffer than the main scene, and one not needed in the 360 render path. Eventually you blend this 1/4 buffer back with the main scene on PS3, but in the mean time it sits around eating additional memory. Depending on your post process steps it may or may not be easy to reuse an existing 1/4 sized buffer (assuming you even have one).

Lost Planet on 360 has the same particle resolution as in Killzone 2 and they don't look anything near the level of Killzone 2's particles. There are some tricks going on with the way they are rendered (everybody knows they blend particles into 4xMSAA).

Let's wait and see their slides at GDC09.
 
And this is something i don't believe anymore. It might have been true for high level stuff like treatment of textures or shaders etc. But when it comes to the specifics of CELL/PS3 there's almost nothing left that can be used on 360 (imo) because it's a totally different architecture.

I believe they are referring to the principle of data locality and any memory/bandwidth optimization techniques, which should be valid for all. NAO HDR, "first" applied in Heavenly Sword, would be an outstanding example. Besides, if it's true for treatment of textures and shaders as you said, then my statement is not wrong.

Are you a game programmer with actual coding experience? If not, how can you tell?

You mean the "equal" output eventually if the developers so desired it ? Because they can manage the scope and specs better once they know what to avoid and what can be relied on.
 
...
KZ2 isn't pulling off real HDR right? RSX has issues with AA and HDR if I remember.. Is it using nAO32?

What is "unreal HDR" then?
For me it's all VALVe's fault with their LDR -> Bloom -> HDR Lighting. And a lot of people think, Bloom is an opposite of HDR Rendering.

Yes, KZ2 doesn't store their "L-Buffer" as a floating point framebuffer (STALKER guys do this and use it then for post-processing effects like Bloom with HDR). But remember: Source Engine doesn't use FP Render Targets, too, but there is "HDR+AA".
On the other hand: You don't need to store something in a fp framebuffer, if it's not necessary.
 
Let's pretend the last 3 hours never happened in this thread and carry on where we were. ;) Sorry for any collateral damage in removing the unwanted guest.
 
I don't see how what luck has to do with it. If this is cross-platform render code, you already have to time-share all render-buffers on the 360, hence you can time-share them on any other platform.

If you were a little crazy and you had the ability to easily render a split screen, you could also manually tile (effectively "time-share") on the PS3 to reduce the amount of memory needed for render buffers used pre-post process. So if you really wanted to on the PS3, you could reduce the 360 advantage of EDRAM in terms of memory utilization in MSAA cases. Of course this would increase the vertex pressure, which likely wouldn't be the best of ideas on the PS3.
 
I think people forget sometimes that the 360 has 3 cpu cores and hence 3 vector units. The PPC cores might be poop, but the vector units aren't bad. Stuff like "cpu side skinning" may look and sound good on slides, but it's cake to implement on the 360s vmx unit if you so choose.

Yeah, but then again there's not so much else left you can do so it's really not suitable for all kinds of games (on 360) due to the limited resources. The vector units might not be bad, but there's other things that have a major influence here (like bandwidth and communication overhead).

Still it's not (yet) done on 360, so there's not so much to argue about.

Are you a game programmer with actual coding experience? If not, how can you tell?

Yes, but no games ;)
 
http://www.gamekings.tv/index/videos/minidocu-the-company-behind-killzone-2-full-version-subbed/

Pretty interesting 40 minute Behind the Scenes of the making of Killzone 2. They reveal more info about the engine and their approach. You even get a look at the PS3 dev kit. They also confirm Cell is also used, along with the GPU for graphics.

Can someone take a look at the devkit numbers, memory usage and stuff, and interpret it? :)

I think the dev stuff starts around 17 mins give or take.
 
joker454 said:
On 360 you render particles right into edram along with the opaque pass, which then all gets resolved out to the main 1280x720 scene in main ram
Ok under those conditions I'll give you it depends on luck a bit :p
It will depend on a bunch of things if it can be done - but still, most games should have some offscreen render-buffers sitting around for time-sharing. Considering even back in PS2 generation it wasn't uncommon to have 10+ offscreen targets if you did anything worthwhile with pixels, and that was without using shadowmaps.

TimothyFarrar said:
If you were a little crazy and you had the ability to easily render a split screen, you could also manually tile
Might not be that crazy at all - if you already run your geometry pipeline through SPUs, it could just be a natural extension. But then tiling is something you can just drop in the last second, while memory limitations usually pop-up at that very time. And I doubt many PS3 devs would bother to plan for tiling ahead of time - though if you are multiplatform, maybe...
 
And this is something i don't believe anymore. It might have been true for high level stuff like treatment of textures or shaders etc. But when it comes to the specifics of CELL/PS3 there's almost nothing left that can be used on 360 (imo) because it's a totally different architecture.

Perhaps you should atleast try to code something before you make such a statement?
 
Can someone take a look at the devkit numbers, memory usage and stuff, and interpret it? :)

I think the dev stuff starts around 17 mins give or take.

Mmmm, KZ2 bean spilling.

Gain about 20 to 40 percent extra speed having SPUs do GPU work, so really not far off from my 15-30% estimation.

Below is my best attempt at reading all the debug stats. Note, had to take from a bunch of frames to gather these numbers, also had to guess at a few of the numbers.

Code:
CPU TIME
--------
Unknown .......... 1.24%
SPU Sync ......... 0.06%
AI Manager ....... 0.47%
Game Logic ....... 9.52%
Script ........... 0.80%
Physics .......... 1.57%
Representation ... 10.46%
Draw ............. 20.18%
HUD .............. 2.19%
Sound ............ 0.65%
Profile HUD ...... 25.17%
GPU Sync ......... 37.99%
----------
Total Time ....... 36.85%


SPU TIME
----------------------------
AI.Cover ................... ........ 0.00%
AI.LineOfFire .............. ........ 0.00%
Anim.EdgeAnim .............. 33 ..... 2.01%
Anim.Skinning .............. 152 .... 30.68%
Gfx.DecalUpdate ............ 9 ...... 0.78%
Gfx.LightProbes ............ 396 .... 9.00%
Gfx.PB.DeferredSchedule .... 1 ...... 0.60%
Gfx.PB.Forward ............. 2 ...... 1.69%
Gfx.PB.Geometry ............ 1 ...... 18.67%
Gfx.PB.Lights .............. 1 ...... 0.66%
Gfx.PB.ShadowMap ........... 1 ...... 4.20%
Gfx.Particles.ManagerJob ... 1 ...... 3.14%
Gfx.Particles.UpdateJob .... 130 .... 12.33%
Gfx.Particles.VertexJob .... 70 ..... 20.64%
Gfx.Post.BloomCapture ...... 12 ..... 2.80%
Gfx.Post.BloomIntegrate .... 8 ...... 1.52%
Gfx.Post.DepthOfField ...... 64 ..... 12.12%
Gfx.Post.DepthToFuzzy ...... 8 ...... 0.67%
Gfx.Post.Downsample ........ 29 ..... 0.61%
Gfx.Post.GrainWeight ....... 1 ...... 0.51%
Gfx.Post.HBlur ............. 45 ..... 3.02%
Gfx.Post.ILR ............... 1 ...... 0.63%
Gfx.Post.Modulate .......... 27 ..... 1.3?%
Gfx.Post.MotionBlur ........ 46 ..... 11.31%
Gfx.Post.Unlock? ........... 1 ...... 0.01%
Gfx.Post.Upsample .......... 108 .... 9.47%
Gfx.Post.VBlur ............. 46 ..... 3.73%
Gfx.Post.Vg??lle ........... 1 ...... 1.18%
Gfx.Post.Zero .............. 16 ..... 0.64%
Gfx.Scene.Portals .......... 3 ...... 30.72%
Mesh.Decompression ......... ........ 0.00%
Physics.Collide ............ 4 ...... 2.48%
Physics.Integrate .......... 4 ...... 2.11%
Physics.KdTree ............. 8 ...... 20.50%
Physics.Raycast ............ ........ 0.00%
Snd.MP3.Stereo ............. 2 ...... 2.60%
Snd.MP3.Surround ........... 2 ...... 7.51%
Snd.?Synth ................. 35 ..... 3.23%
Snd.Reverb ................. 14 ..... 4.02%
---------------------------- 
Total Time ................. 1232 ... 227.46%


GRAPHICS
--------
FPS ................. 30
GPU Stall by CPU .... 0.123 ?s
CPU stall by GPU .... 12.231 ?s


GPU TIME
--------------------------
Unknown  ....... 0.2?/ ... 3.43%
Geometry ....... 1.8?/ ... 43.37%
Lighting ....... 1.7?/ ... 14.??%
Effects ........ 8.5?/ ... 8.4?%
Post process ............. 18.31%
--------------------------
Total Time ............... 81.??%
GPU Stall ................ 0.??%


PRIMS / TRI
-----------
Totals ..... 1431/ ... 344,634
Prime? ..... 0/ ...... 0 
Geometry ... 619/ .... 161,231
Shadow ..... 683/ .... 170,???
Effects .... 121/ .... 14,3??


MEMORY STATS
------------------------
Pushbuffer ???? ........ 0.15 MB
Pushbuffer High ........ 0.15 MB
VRAM Free .............. 23.43 MB
Host Free .............. 80.?? MB
Heap Free .............. 134.?? MB
Render Mem ???? ........ 0
Render Mem Used ........ 12.00 MB
Render Mem Watermark ... 12.00 MB


MAIN RAM ....... 101.00 MB
----------------
Physics ........ 5.30 MB
Collision ...... 3.72 MB
Sound .......... 16.25 MB
Mesh ........... 21.20 MB
Graphics ....... 6.53 MB
Animation ...... 34.45 MB
Texture ........ 0.56 MB
Shader ......... 1.46 MB
AI Data ........ 2.75 MB
Various ........ 3.32 MB
Waste .......... 5.27 MB
----------------
Total .......... 97.17 MB
Main RAM ??? ... 97 / 101


VIDEO RAM .. 190.04 MB
------------
Mesh ....... 15.99 MB
Texture .... 156.87 MB
Waste ...... 1.20 MB
Total ...... 174.08 MB
 
Mmmm, KZ2 bean spilling.

Gain about 20 to 40 percent extra speed having SPUs do GPU work, so really not far off from my 15-30% estimation.

Below is my best attempt at reading all the debug stats. Note, had to take from a bunch of frames to gather these numbers, also had to guess at a few of the numbers.

Regarding SPU time, what does that 227.46% mean? Is it that they're using slightly more than the equivalent of 2 SPU's processing power at that time?

Also, the memory numbers seem to low and there's a discrepancy between the numbers at the top ("main ram" and "video ram") and the Totals below. Could it be the difference between something like Heap size and actual allocated memory?

Edit: And since we're at it, what's your opinion regarding KZ2's tech? I hope you dissect it in your blog eventually, R2-style. :)

Edit2: you already did lol
 
Last edited by a moderator:
Regarding SPU time, what does that 227.46% mean? Is it that they're using slightly more than the equivalent of 2 SPU's processing power at that time?

Also, the memory numbers seem to low and there's a discrepancy between the numbers at the top ("main ram" and "video ram") and the Totals below. Could it be the difference between something like Heap size and actual allocated memory?

Edit: And since we're at it, what's your opinion regarding KZ2's tech? I hope you dissect it in your blog eventually, R2-style. :)

Edit2: you already did lol

From what I understand in high end console games, usage of a heap or dynamic memory is very strongly discouraged.

The memory not mentioned is probably for things like buffers (The G-Buffer is not even listed and that is nearly 40MB) and prefetch caches for streaming from disk and such.



Regarding SPU time, what does that 227.46% mean? Is it that they're using slightly more than the equivalent of 2 SPU's processing power at that time?

In this video the lead engine guy explains that in most scenes they are using at least 2 SPU, but if it gets really busy they will use up to 6.
http://www.youtube.com/watch?v=ummHRrA7D_Y&NR=1
 
Last edited by a moderator:
inefficient said:
From what I understand in high end console games, usage of a heap or dynamic memory is very strongly discouraged.
That was up to and including PS1 generation.
Dynamic allocation became quite common in PS2 era(almost everyone wrote their own allocators though), and this generation you have certain highly popular middleware engines that can chalk up 10k+ dynamic allocations per Frame all on their own.
 
Status
Not open for further replies.
Back
Top