Well, I'm not entirely certain what you're talking about as you describe that.
What I was describing, very specifically, is an architecture that defers rendering until all depth data is readily available.
One thing that you must realize is that for offline rendering, the drawbacks of tile-based rendering are relatively insignificant. That is, the primary drawback, as I stated, stems from the memory requirements of the scene buffer. The problem arises for realtime rendering when the scenebuffer is overrun.
To illustrate this fact, I'll use what is essentially a worst-case scenario. Imagine a game that averages 4MB/frame scene buffer data, but spikes as high as 9MB per frame. The tiler in question uses an 8MB scene buffer (Note that these numbers are quite high compared to the Kyro and modern games, but still far below approaching the memory bandwidth/size requirements of modern z-buffers). For this little thought exercise, let's say that the gamer is cruising along at 1600x1200x32 with 6x FSAA.
Because of the efficiency of the tiler, there is only need for double buffering at 1600x1200x32 (no extra memory for FSAA needed). This means the video card only needs about 14.6MB of framebuffer data, plus the scene buffer at 8MB to store each frame. Obviously this will be better than an immediate-mode renderer.
Until that situation comes in where the scene data spikes to 9MB in a single frame. Once this situation hits, big problems occur. Basically, there are three ways to handle this within the deferred renderer:
1. Just don't worry about it and let graphical errors creep into the scene. Obviously a big no-no.
2. Dynamically allocate more memory for the scene buffer. Again, this can be very bad, and would likely cause a very noticeable stall in performance.
3. Write an external, full-size, z-buffer and frame buffer, clearing the scene buffer, and then writing a new set of tiles. This is apparently what the Kyro line does if this particular problem occurs. In the situation illustrated above, this will require about 58.6MB of framebuffer data, plus the 8MB for the scene buffer (6x for back buffer, 1x for z-buffer, 1x for front buffer...or even more if downsampling is not done on buffer swap)
So, in the end, what we have is a video card that, due to the worst-case scenario of its method of rendering, it will essentially always need to use more video memory for the frame buffer than an immediate mode renderer, and will take a very significant memory bandwidth hit from hitting that worst-case scenario. While it does remain true that with a scene buffer of this approximate size, the memory bandwidth required will still be less than that of the immediate-mode renderer, but that won't matter, as it's the change in required memory bandwidth that is of significance here.
This is also why I feel that a partial deferred renderer would be a good idea. Without attempting to buffer the entire scene, a partial deferred renderer wouldn't have the massive performance deltas from overunning a specific limit that is built into either the drivers or hardware.