Laa-Yosh said:
Also consider that they have to fill the G-buffer for each frame.
Thanks, this is what I was missing in my bandwidth calculations completelly!
So all physics on the PPU, and two SPU not in use? Does the blanks mean that the SPUs idles alot?
As far as I remember the talk, he said that game, AI and physics all run number of own SPU tasks and the gray boxes supposely are these tasks. He did not focus on this part at all (as not being part of rendering) so I suppose it's there to just give a rough idea that there is SPU activity before PPU hits the rendering part. I would say physics of that scale has to be on SPU.
The flow of that part of presentation was roughly:
"PPU orchestrates game logic, AI and physics (with their own SPU tasks), then there is time to prepare draw data that changed (again with some SPU tasks). Display list generator is launched as soon as possible on SPU and it launches own sub-tasks that do skinning, edge geom... then there is shadow map rendering in parallel. And in the meantime PPU moves to update logic for next frame (the red lock thing + next bars on PPU side)..."
As I remember it all of the "colored" SPU bars belong to rendering of one frame (so not to the PPU stuff after (and including) the "data lock" bar) - so rendering leaves the PPU at single point during the "Prepare Draw" bar and PPU can do general stuff for next frame.
He did not mention anything about the picture actually showing real load and to me it just looks like very rough high level overview for the masses.
Fran was there as well so he might fill what I missed.
I'd run a full screen pass that read all the subsamples...
Thanks nAo. That sounds like very practical and logical solution. I wonder - maybe they were doing it already but traded it for early stencil culling used for light volume and "sun" (considering how they rely on early stencil culling in light pass).
nAo said:
EDIT: just realized that this stuff only works if current multisampling implementations supersample depth AND stencil, and not just depth. Unfortunately I dont' know if they do that, though it would make a lot of sense
I would say both are proper per-sample on edges (depth and stencil are stored 32bits in the end).