If true that's a disaster in terms of bandwidth IMHLO (L doesn't stand for lame but layman's heh). Unless of course we'll see some >16xAF sample mode "for free" (due to the bandwidth restriction). The last one is more of a joke of course; I personally could make better use of something like better filter kernels for example than higher than 16x samples.
Else if it truly has something like 256 TMUs or equivalents I doubt real time fillrate could even peak to such hights.
One aspect that hasn't anyone asked this far and obviously there's probably no data available on that, is 8xMSAA performance on GF100. I'd dare to say it has 48 ROPs (wtf it would need 48 pixels/clock is beyond my uneducated imagination), but the 8Z/8C note in Rys' diagram isn't particularly telling IMO in that department yet either.
Finally if it ends up with something like 1200MHz GDDR5 the 384bit "grants" it 230.4GB/sec which might be sufficient and marks a 50% increase over GTX285, closer to what Dally allowed me to speculate from his PCGH interview and way less to anything the so far 512bit scenarios granted.
Let's suppose the design is 700MHz 48ROPs/256TMUs/512SPs and the memory is the same with the 5870's mem (1,2GHz GDDR5). (so 230.4GB/sec with a 384bit memory controller)
With G92b, Nvidia had a design with 738MHz 16ROPs/64TMUs/128SPs and with 1,1GHz GDDR3. (70,4GB/sec with a 256bit memory controller)
GT300 has 2,85X pixel fillrate / 3,8X texel fillrate / 3,3X memory bandwidth.
This logic is kinda simplistic, but things doesn't look so bad if the GDDR5 memory controller has good efficiency.
Nvidia must do something to reduce the hit with 8xMSAA .
Anyway, i guess (wild guess) GT300 will be around 1,5X faster per MHz in relation with a 5870 (4X AA).
The problem with 5870 is that the performance improvement in relation with a 4890 is not consistent. (it has much higher variations in perf. than what 2X specs would normally have, i know about the bandwidth...)
I don't know why is that, but i guess it is either the geometry setup engine (Geometry/Vertex assempler has same performance with 4890's) or something about the Geometry shader performance?
I could only find 3DMark Vantage tests, if you check:
http://www.pcper.com/article.php?aid=783&type=expert&pid=12
GPU Cloth: 5870 is only 1,2X faster than 4890. (vertex/geometry shading test)
GPU Particles: 5870 is only 1,2X faster than 4890. (vertex/geometry shading test)
Perlin Noise: 5870 is 2,5X faster than 4890. (Math-heavy Pixel Shader test)
Parallax Occlusion Mapping: 5870 is 2,1X faster than 4890. (Complex Pixel Shader test)
It shouldn't be a problem of the dual rasterizer/dual SIMDs engine efficiency since synthetic Pixel Shader tests is fine (more than 2X) while the synthetic geometry shading tests is only 1,2X.
Is these synthetic vertex/geometry shading tests so bandwidth limited in order to deliver 1,2X instead of 2X?
Or the Pixel Shader tests like the Parallax Occlusion Mapping test is not bandwidth limited at all? (why they deliver more than 2X?)
And anyway, it is not logical for 5870 to be
extremely bandwidth limited.
Why ATI to waste transistor resources like that, if the design is not going deliver (in such a degree) because of the bandwidth?
Certainly, they would have used the transistor space in a more efficient way.