Will Nvidia return to a 512-bit width bus for GF110 / GTX 580?

I think GF110 is simply GF100 done right (or less wrong...).

I remember neliz once hinted that GF100 actually has 128 TMUs and that it just has half of the TMUs per SM disabled for yield/power reasons.

Take GF100, replace its TMUs with those of GF104, maybe double the TMUs like some rumors suggested (or rather enable all of them if neliz was right), tweak the chip in a way that reduces defect rate and leakage and be done with it.

A 512 CC, 128 TMU, 48ROP, 384bit MC chip clocked at 750/1500 core/shader and 1+ GHz GDDR5 could reach the rumored 20% higher performance in scenarios with lots of texture mapping/filtering going on.
+5-6% from higher clocks
+5-6% from the 16th SM
+5-10% from the improved and doubled TMUs

I think a 512 bit MC is unlikely. GF100 is not bandwidth-limited, and unless they change the ROP and L2 cache ratio, it would mean 16 additional ROPs and 256 KB additional L2 cache as well. Judging by the abysmal perf./mm² of GF106 compared to other GF10x chips, ROPs/L2 cache seem to occupy quite a few mm² without any noteworthy performance gains.

GF100 already is too big and power hungry, adding even more stuff that does little to nothing for gaming performance while further increasing die area and power consumption sounds rather counter-productive to me.
The only possibility I see is that Nvidia halves the number of ROPs and L2 cache per 64bit MC. Then it might be 32 ROPs, 512 KB L2 cache and 512 bit. Not sure if that really makes sense, though.

Last but not least: The more changes they make, the longer and more expensive the chip development becomes. Another reason why I think it makes sense for NV to stick to 384bit for GF110.
 
On the other hand - will the 128 TMUs significantly impact performance if the GPU is still limited to 32 pixels per clock? Maybe G80-like TMUs or at least GF104-like TMUs would make more sense...
 
GF104's SMs are 3*16 with no DP support. According to Anandtech each GF104 SM (48SPs, dual scheduler, 8 TMUs, no DP) is roughly 25% bigger than a GF100 SM (32SPs, 4TMUs, DP at 1/2 rate).

GF104 certainly does support native hardware DP. 16 of the SPs per SM do both DP and FP, 32 of the SPs are smaller in die size and do only FP. GF100's DP is much more powerful, but it's not gone from GF104.
 
One of the real issues with 512-bit memory interfaces is that they cannot be used for dual-GPU cards. There are limits on the trace length of GDDR5, and with 2x512-bit interfaces, that means 32 DRAMs per card. Very challenging to do.

Also, with a 512-bit card, you end up needing 16 DRAMs, each of which is 3-7W and costs $2-5. So you really do increase your cost rather dramatically.

It's rather unlikely that ATI will go down this route, because it would totally kill their approach of hitting the high-end with a dual GPU card.

DK
 
GF104 certainly does support native hardware DP. 16 of the SPs per SM do both DP and FP, 32 of the SPs are smaller in die size and do only FP.
That would give 1/3 DP throughput, but in reality it's only 1/12. I don't know how it's done (don't even know if 32bit int math the same?) but in any case it looks quite different to GF100.
 
That would give 1/3 DP throughput, but in reality it's only 1/12.
Because Nvidia artifically limits the DP rate of consumer cards to 1/4th of their true capabilities to ensure that people who need all the DP power they can get buy the expensive Quadro cards instead.
 
Quadro GF108/106 are 1/12 SP, too. In GF104 design one Vec16-ALU is capable of DP emulation, maybe with 4 clocks for one operation or like on AMD GPUs since RV670 with 4 ALUs calculating on one operation.
 
Quadro GF108/106 are 1/12 SP, too. In GF104 design one Vec16-ALU is capable of DP emulation, maybe with 4 clocks for one operation or like on AMD GPUs since RV670 with 4 ALUs calculating on one operation.

It's not emulation... it's full IEEE 754-2008 DP.
 
Quadro GF108/106 are 1/12 SP, too. In GF104 design one Vec16-ALU is capable of DP emulation, maybe with 4 clocks for one operation or like on AMD GPUs since RV670 with 4 ALUs calculating on one operation.
I wouldn't quite call that emulation if it works like that. Is Int-32 handled the same? It looks to me like both the "combine 4 alus" and "loop for four clocks" approach needs quite some changes to both the alus themselves and also instruction dispatch. But I haven't seen anywhere how it's done (I'm not sure how GF100 DP rate is reduced neither).
 
GF104 certainly does support native hardware DP. 16 of the SPs per SM do both DP and FP, 32 of the SPs are smaller in die size and do only FP. GF100's DP is much more powerful, but it's not gone from GF104.

I stand corrected.
 
That would give 1/3 DP throughput, but in reality it's only 1/12. I don't know how it's done (don't even know if 32bit int math the same?) but in any case it looks quite different to GF100.

Is it real that nVidia really throttle that Dp performance in GF104. My though went more into possibility that 8x16sSP-> 128sSP -> ~64dSP capable shaders while in GF100 class there are DP capable 480SPs out of the box yep thus gives 480/(~)64->only ~7.5 more rate at the same clock but dont you think that GF100 has more capable scheduler so that up to 12x more DP power. Yep i know all FloP SPs are usually 32 but isn't there some "kind of magic" that little wider casual FP SP can be transformed to crunch much more data just by scheduler that can do much better alignment than one prepared for processing only with 32b FP.

GF104 is just a hybrid with GT200 capable DP FP processing, and it was in fact fast refitted GT212 that was prepared to launch in early Q3 2009 before Fermi just troubles with 40nm node didnt allow nVidia to do that as we saw from Fermi's delays. So now we thankfully get GT212 which is capable for DX11 processing.

So GF100 isnt that failure after all, just like NV30 wasn't. (Putting aside commercial plummet) But Fermi did much more leap from GT200 than NV30 did from NV25 or even NV20 architecture. It's just that nv couldn't do as much design respinning on extremely immature 40nm node even a year after TSMC introduce it.

But why we all expect GF110 to be GF100 successor in HPC. Why shouldn't it be just good enough gaming card which would offer high-end graphic performance without raising original GF104/GF114 TDP to GTX480 heights. +30% of GTX470 performance at same TDP (or even lower) would probably make it decent Cayman competitor, and GF100 rework if ever should wait 28nm, but that would probably be time to introduce new name Kepler -furbished Fermi architecture @28nm- into scheme.
 
GF104 is just a hybrid with GT200 capable DP FP processing, and it was in fact fast refitted GT212 that was prepared to launch in early Q3 2009 before Fermi just troubles with 40nm node didnt allow nVidia to do that as we saw from Fermi's delays. So now we thankfully get GT212 which is capable for DX11 processing.

Try repeating that a few more times. Maybe it'll move from the realm of fantasy into reality.
 
GF104 is just a hybrid with GT200 capable DP FP processing, and it was in fact fast refitted GT212 that was prepared to launch in early Q3 2009 before Fermi just troubles with 40nm node didnt allow nVidia to do that as we saw from Fermi's delays. So now we thankfully get GT212 which is capable for DX11 processing.

Are you insane? Have you actually read anything on the GF104? It's as far from the rumored GT212 as you can possibly get!
 
Try repeating that a few more times. Maybe it'll move from the realm of fantasy into reality.

GT21x/DX10.1@40nm are out there for quite some time. Just don't try to explain to someone that doesn't want to understand that it's nonsense to believe that you can take a 10.1 architecture and just glue DX11 capabilities on top of it. Vapor-ware GT212 was rumored to have 384SPs and thus a random number that happens to fit one aspect of the GF104 specifications.

It's the same thing in reverse as when everyone used to say that Cypress is merely a RV770 with DX11 glued over it.

So GF100 isnt that failure after all, just like NV30 wasn't. (Putting aside commercial plummet) But Fermi did much more leap from GT200 than NV30 did from NV25 or even NV20 architecture. It's just that nv couldn't do as much design respinning on extremely immature 40nm node even a year after TSMC introduce it.

NV30 had a number of wrong design decisions. Since the latter is quite a vague description I could easily say that insisting on a very high complexity chip despite TSMC giving fairly early warning is a wrong design decision too. The question here would be if it even was possible for NV to change anything (if Anand's Evergreen article is 100% accurate) somewhere in early 2008.

Of course aren't parallels to NV30 not fair in this case. But it still raises the question if NV in the future can and will adjust its strategy in order to have alternative options in case shit hits the fan again down the line.

But why we all expect GF110 to be GF100 successor in HPC. Why shouldn't it be just good enough gaming card which would offer high-end graphic performance without raising original GF104/GF114 TDP to GTX480 heights. +30% of GTX470 performance at same TDP (or even lower) would probably make it decent Cayman competitor, and GF100 rework if ever should wait 28nm, but that would probably be time to introduce new name Kepler -furbished Fermi architecture @28nm- into scheme.

I don't consider it nonsense anymore that NV might this time go for a more 3D concentrated part with GF110. However considering the so far supposed leaked details it doesn't sound like a GF104 derivative at all. More like a "3D mostly GF100" with twice the TMU amount per SM, but I could be of course completely wrong.

NV will of course introduce in the future 28nm Fermi variants, while Kepler on the other hand (irrelevant of process used for it) will have as much in common with Fermi as the latter has with GT2x0.
 
So GF100 isnt that failure after all, just like NV30 wasn't. (Putting aside commercial plummet) But Fermi did much more leap from GT200 than NV30 did from NV25 or even NV20 architecture. It's just that nv couldn't do as much design respinning on extremely immature 40nm node even a year after TSMC introduce it.

Did you really just say that the NV30 wasn't a failure?

It failed massively. It was such a failure that it was canceled not long after the launch. Only about 100K were made!
 
Back
Top