Teasy said:
Yes there may need to be some sort of Z-buffer there (unless they do sort before sending to the chip with scene manager, which is perfectly possible) to connect the 2 peices. But that is still not what you described earlier.. unless I mis-understood what you meant by the comment I originally quoted?
Right. I'll see if I can re-explain my point of view on this issue:
Assumption:
The deferred rendering architecture in question does all sorting on the card. That is, on a software level, it acts exactly like an immediate-mode renderer. This is an absolute necessity for modern graphics cards, particularly with hardware T&L.
Given this, the hardware absolutely needs every single triangle in the frame to be passed to it before it begins rasterization (The performance impact of this can be reduced by using double buffering...which the Kyro apparently uses a form of, though the scene buffer memory is freed once a tile is rasterized, so it doesn't need all of the memory for two frames...). That is, if it wants to act entirely as a deferred renderer. Since the triangles can come in in any order, until the next triangle is sent to the graphics card, it will have no idea which tile it will go in, or what its depth values will be. In this way, if the scene buffer is overrun, and the hardware compensates by doing a second rendering pass, then there is absolutely no way around doing a full external z-buffer.
Another option for overunning the scene buffer would be to simply dynamically increase the size of the scene buffer. This will obviously cause a performance stall, probably even larger than what was described above.
In the end, I really feel that while deferred rendering is not an utterly useless concept, the hardware should not attempt to always cache the entire scene. In this way it will avoid massive frame drops in specific, even if rare, scenarios.
It also seems apparent that there really isn't a need for that much more fillrate. What we need are more complex shaders (more computational power...not necessarily more memory bandwidth) and more geometry.