I think GF110 is simply GF100 done right (or less wrong...).
I remember neliz once hinted that GF100 actually has 128 TMUs and that it just has half of the TMUs per SM disabled for yield/power reasons.
Take GF100, replace its TMUs with those of GF104, maybe double the TMUs like some rumors suggested (or rather enable all of them if neliz was right), tweak the chip in a way that reduces defect rate and leakage and be done with it.
A 512 CC, 128 TMU, 48ROP, 384bit MC chip clocked at 750/1500 core/shader and 1+ GHz GDDR5 could reach the rumored 20% higher performance in scenarios with lots of texture mapping/filtering going on.
+5-6% from higher clocks
+5-6% from the 16th SM
+5-10% from the improved and doubled TMUs
I think a 512 bit MC is unlikely. GF100 is not bandwidth-limited, and unless they change the ROP and L2 cache ratio, it would mean 16 additional ROPs and 256 KB additional L2 cache as well. Judging by the abysmal perf./mm² of GF106 compared to other GF10x chips, ROPs/L2 cache seem to occupy quite a few mm² without any noteworthy performance gains.
GF100 already is too big and power hungry, adding even more stuff that does little to nothing for gaming performance while further increasing die area and power consumption sounds rather counter-productive to me.
The only possibility I see is that Nvidia halves the number of ROPs and L2 cache per 64bit MC. Then it might be 32 ROPs, 512 KB L2 cache and 512 bit. Not sure if that really makes sense, though.
Last but not least: The more changes they make, the longer and more expensive the chip development becomes. Another reason why I think it makes sense for NV to stick to 384bit for GF110.
I remember neliz once hinted that GF100 actually has 128 TMUs and that it just has half of the TMUs per SM disabled for yield/power reasons.
Take GF100, replace its TMUs with those of GF104, maybe double the TMUs like some rumors suggested (or rather enable all of them if neliz was right), tweak the chip in a way that reduces defect rate and leakage and be done with it.
A 512 CC, 128 TMU, 48ROP, 384bit MC chip clocked at 750/1500 core/shader and 1+ GHz GDDR5 could reach the rumored 20% higher performance in scenarios with lots of texture mapping/filtering going on.
+5-6% from higher clocks
+5-6% from the 16th SM
+5-10% from the improved and doubled TMUs
I think a 512 bit MC is unlikely. GF100 is not bandwidth-limited, and unless they change the ROP and L2 cache ratio, it would mean 16 additional ROPs and 256 KB additional L2 cache as well. Judging by the abysmal perf./mm² of GF106 compared to other GF10x chips, ROPs/L2 cache seem to occupy quite a few mm² without any noteworthy performance gains.
GF100 already is too big and power hungry, adding even more stuff that does little to nothing for gaming performance while further increasing die area and power consumption sounds rather counter-productive to me.
The only possibility I see is that Nvidia halves the number of ROPs and L2 cache per 64bit MC. Then it might be 32 ROPs, 512 KB L2 cache and 512 bit. Not sure if that really makes sense, though.
Last but not least: The more changes they make, the longer and more expensive the chip development becomes. Another reason why I think it makes sense for NV to stick to 384bit for GF110.