Could anyone more experienced in TBDR-related knowledge confirm or deny my sayings?
Why do need a TBDR expert to answer your questions in the first place is beyond me; can I try albeit being a complete layman?
Would you be hardpressed to answer to a question wether today´s high end cards are rather fillrate or bandwidth "hungry", what would you say?
If the answer is going into the bandwidth direction, then why should it be different with upcoming products, since arithmetic and computational efficiency´s importance will increase? Is the real problem of NV30 it´s bandwidth?
Furthermore you´re trying to compare a pure IMR with no advanced bandwidth saving techniques (TNT) vs. an on paper equivalent speced TBDR (K2).
Let´s take q3a as an example for those, which PowerVR claims to have measured an average overdraw in demo001 at 3.39.
TNT2
2*150MHz = 300MPixels/sec
183MHz SDRAM = 2.93GB/sec
KYROII
2*175*3.39 = 1186MPixels/sec
175MHz SDRAM *3.39 = 8.8GB/sec
Now the resulting numbers might seem exaggerated, but you need about 1.1GPixels in that game in raw fillrate to reach 60 fps in 1280*1024*32. How far apart is a K2 from that number?
Nowadays with IMRs using several combinations of advanced bandwidth saving techniques, I´d say that calculations could become tricky. However I´d think that PowerVR - as practically any other IHV - has done enough research as to where future games/applications are heading to and their requirements and have picked the best possible sollution in accordance to their architecture and NOT what everyone else would do or is doing.
But then again, should nVidia be *really* smart and that their FSAA algorithm "revamp" actually means an all new algorithm ( FAA? Who knows ) - then PowerVR would suddently look quite bad indeed. But frankly, I doubt that, and I think Series 5 is unlikely to look really bad.
If you dedicate enough transistors a TBDR can have as fast MSAA as FAA on an IMR and with the same at least amount of samples. I´d speculate that by the time IMRs move to exotic algorithms like that, TBDRs will most likely too, just because it´s cheaper to implement in hardware.
So following this, I see no reason why there couldn't be an 8 pipeline DX9 TBDR, with a similar transistor count as FX 5900--110 million. And this should be paired with similar memory as the FX: roughly synchronous, 256 bit, DDR memory.
Forget pipelines with PS/VS3.0 products and the transistor count up there (albeit I have no idea how it looks like in reality) sounds quite low. Core and memory should run at isochronous speeds and I´m afraid any of your guestimates considering buswidth will be completely off-track ....
***edit: what´s the ratio between fillrate and bandwidth on K2?
*runs for his life*