Ailuros said:
A TBDR won't shade fragments that end up being occluded. An IMR often will, unless a "software deferred" rendering style is being used, ala Doom3. So in fragment shader-heavy scenes, a TBDR can achieve the same realized fillrate as an IMR that has greater fragment shader execution resources.
Let´s make it more specific then: how much would you predict would the difference in transistor count be between a PS/VS3.0 TBDR and an equivalent IMR?
Err. No idea.
I mean, first there's the fact that I don't think there's any fair definition of "equivalence" between a TBDR and an IMR, because the efficiency advantages of a TBDR vary depending on the scene data, rendering techniques, and settings in question. What's the level of pure overdraw inherent in the scene? Are the polys sent to the card in back-to-front order, front-to-back, or somewhere in between? Is a Doom3 style z-only first pass being used? Is the IMR bandwidth-limited or fillrate-limited?
Note that a TBDR's efficiency benefits come only on the rasterize/render side of the rendering process, not in the geometry side. So a TBDR's benefits should tend to grow at higher resolutions. (I think. I haven't thought that through completely.) Also note that, by merely donating enough on-chip hardware, a TBDR has the option of providing multisampling AA entirely on the chip, and thus at essentially no performance cost. (The performance hit of MSAA on an IMR comes from the extra off-chip bandwidth that is used.) So on the one hand, a well-designed TBDR is going to have a higher transistor count than otherwise, in order to take advantage of this feature, which will give it a huge benefit at high levels of MSAA, but no benefit at all if AA isn't turned on. Of course, a well-designed IMR is going to dedicate tons of logic to various workload reducing algorithms which are unnecessary on a TBDR. And so forth.
Second, I really couldn't say as I don't have a good notion of what fraction of overall transistor budgets are currently dedicated to which functions on a GPU. For some ridiculous reason I can never understand, the consumer 3d hardware industry refuses to release any technical information whatsoever about their products (even though the competition could easily reverse-engineer most info and the industry moves so quickly that any "trade secret" embodied in a current product would be useless by the time it could be incorporated into a competitor's future chip). So, AFAIK, no such information is publicly available.
One could make reasonable guesses based on overall transistor counts for various GPUs of various configurations (assuming the configuration details are publicly available, which has proved famously untrue of a certain IHV's chips of late). But that wouldn't take you so far, and I'm not going to pretend I know how the upgrade to PS/VS 3.0 is going to affect transistor counts in the vertex and fragment shader pipelines. (I mean, yes, it primarily means the addition of texture address calculation and sampling units in the vertex shaders, and of the logic to implement dynamic flow control in the fragment shaders. But how a balanced design will be crafted from the combination of the existing shader pipelines and those new requirements is anybody's guess. There's more than one way to skin a cat. And, among other things, the IHVs (but not me) have run hundreds of thousands of cat-skinning simulations to help them decide on the best method.)
Essentially there are two cop-out approaches to answer your question. One is to note that TBDRs require the same hardware resources on the geometry side, and less on the fragment side by the factor of actual overdraw on the IMR, to achieve the same performance. Factor in the cost of sorting/tiling logic, z-caches, etc. on a TBDR, and the cost of framebuffer compression, overdraw reducing algorithms, and hierarchical z-cache on an IMR. And you've got an answer rough enough to be pretty meaningless.
The other approach is to note that transistor count tracks closely (albeit not linearly) with IC cost. Unfortunately, the other determinants of chip cost are yields--about which we can't presume anything useful--and volume, partially because of volume fab discounts, but primarily in terms of how many chips you have to amortize design costs over. If we presume PowerVR is providing the TBDR, and ATI or Nvidia the IMR, then it's clear that PVR is going to sell fewer units and thus have higher costs for a chip with the same transistor count. Still, it's reasonable to assume that GPUs selling into the same market segment have broadly similar transistor counts. At which point your question devolves--with a great deal of fudging and hand-waving--into one about price/performance.