And how large that GF110 should suppose to end up just to support 512-bit bus alongside increased numbers and clock its SPs to adequately saturate it?
Depends for which markets the chip is actually meant for.
Rumor mill says that it should be GF104 derivative
GF104's SMs are 3*16 with no DP support. According to Anandtech each GF104 SM (48SPs, dual scheduler, 8 TMUs, no DP) is roughly 25% bigger than a GF100 SM (32SPs, 4TMUs, DP at 1/2 rate).
When I hear 512SPs my mind obviously doesn't go to a 3*16 but a 2*16 configuration. The possibility that each SM could contain this time 8 TMUs doesn't justify in the least any supposed "GF104 derivative".
an canceled GT212, when fermi was announced, but now adapted for dx11 pipeline instead originally dx10.1.
I'd love to read that horseshit story in detail how that is even possible in a sensible way. The possibility that NV might had diverted GT212 resources when they killed that project into a GF104 development team has absolutely nothing to do with the nonsense rumor above. And once we're at it guess what the vaporware GT212 (DX10.1) had neither 2*16 nor 3*16 SMs if that helps.
It doesnt just share same kind of "simple" SPs with GT200 series but TMU performance also.
There's one major difference between GT200 and GF1xx TMUs in case you haven't noticed yet: in the latter the TMUs are "sitting in the SMs". Other than that in all likeliness GF110 might have 8 TMUs/SM but neither the TMU amount has anything to do with GT200 nor does it make the chip any sort of GF104 "derivative".
If it really contains 8 TMUs/SM it would mean that it was NV's only other option to increase performance by a noticeable notch compared to the GTX480 today.
And GF104 proved to be better gaming part than GF100. So it's my reasonig we won't see 512b GDDR5 memory controller on GF110, i even don't believe that aka GF110 will even saturate Cayman memory bandwidth if they will share same one 256b 6Gbps.
With roughly 110% more texel fill-rates compared to a GTX480 I'd suggest a healthy bandwidth increase compared to the latter's raw bandwidth. Does NVIDIA typically go the high frequency ram route in the recent past or is it just a weird coincidence that G80 was on 384bit, G92/256 and GT200/512bit. Convince me why a 512bit was a necessity on GT200 and why they didn't bank for a 384bit bus instead and then it will be a lot easier to reach common ground on that one. As brute force the 512bit bus scenario might sound it is rather typical for NVIDIA and no it isn't obviously the most "elegant" or efficient solution but that's besides the point.
Also bet they would done much better job with redesigning of poor performing Fermi GDDR5 MC than to widen it just to enhance consumption of already over-consuming MC to another heights. Better way, just opposite direction
, would be just as AMD done with Barts, to use improved version of 256b MC that will satisfy chip's needs. imo, GF104 shares same GF110 MC w/o redesigning (disregarding width)
See above. What's historically likelier and was the MC on G8x/9x/2x0 broken too?
You can't ever say that a GPU needs X bandwidth. The optimal balance of BW, setup, ROPs, SIMD, etc changes for each triangle of the scene.
I personally obviously can't but I'd imagine that each IHV's engineers run a long sequence of simulations time before release to find the "golden bandwidth spot" for each architecture.
For BW, the best model is the one I described. Workloads are usually pretty starkly BW limited when they are, so linear models (i.e. scene time = X / GPUclk + Y / BW) extrapolate well. If you can crank out pixels as fast as Cypress can, then you'll be 15-35% BW limited in games with 154 GB/s, depending on the title. So moving up to 230 GB/s will buy you 5-11% performance. Moving up to 308 GB/s will buy you 11-21% performance.
I really don't know what the total costs are. I do know that the other way to get 20% performance - adding ALUs/setup along with the more complex crossbars and support logic to feed them - is not cheap.
Some rumors want GF110 to have 8TMUs/SM. With 512SPs that would give you 128TMUs could get you in the </=110% texel fill-rate increase region compared to GTX480. That fill-rate increase won't buy them of course as much performance increase by far, but my layman's imagination would suggest that such a fill-rate increase would also need a strong bandwidth increase.
And I'm only thinking of a 512bit bus because the rumors for a 2GB framebuffer are ever repeating lately. What I can't figure out for the world is what they'd do with 64 ROPs if the ROPs remain the same and 8 per partition. If the entire chip should still have 4 raster units capable of 8 pixels/clock rasterizing (32 pixels/clock total), the hypothetical additional 32 ROPs sound like vast overkill.