NVIDIA GF100 & Friends speculation

You're being awfully generous to the 'press' with your expectations. A few folks might consider availability, but frankly most don't.

The gap between knowledge in industry and press is quite amazing.

David

I guess I'm an optimist, thinking (hoping) that the so called press will act as journalists rather than simply report preprepared summarizations and generalizations.. then again soo many of the so called "press" are more appropriately over glorified fan sites that give best case results so they wont have to actually start paying for hardware and service.
 
Heres a thought that maybe they should have gone with. GTX 460-480 with 460s having the 450-475 core clocks, 470s having 500-550 and the 480s being 600-650.

Too close to the rumored 1/2 GF100 that might appear slightly later.
 
I guess I'm an optimist, thinking (hoping) that the so called press will act as journalists rather than simply report preprepared summarizations and generalizations.. then again soo many of the so called "press" are more appropriately over glorified fan sites that give best case results so they wont have to actually start paying for hardware and service.

Exactly, which is why I said Nvidia will be better off with a PE part. But only if it leads to availability in the future, like with the 5870's. If it never materializes then it will be relegated to the annals of launch failures and revisited every six months for eternity. To be honest, it'll be a refreshing change from the constant harping on NV30, 2003 is so last decade.
 
I guess I'm an optimist, thinking (hoping) that the so called press will act as journalists rather than simply report preprepared summarizations and generalizations.. then again soo many of the so called "press" are more appropriately over glorified fan sites that give best case results so they wont have to actually start paying for hardware and service.

The first reviews I'll be looking for is Rys' here and Damien's over at hardware.fr.

Damien's latest article: http://www.behardware.com/articles/782-1/nvidia-geforce-gf100-the-geometry-revolution.html . He obviously doesn't have all the data he could have for a more detailed analysis in this case, but his criticism and thought are on the level I would want to.
 
The first reviews I'll be looking for is Rys' here and Damien's over at hardware.fr.

Damien's latest article: http://www.behardware.com/articles/782-1/nvidia-geforce-gf100-the-geometry-revolution.html . He obviously doesn't have all the data he could have for a more detailed analysis in this case, but his criticism and thought are on the level I would want to.

Thanks, hadn't seen his piece. Interesting point he made about the IHV's trying fully decoupled texture units with R600 and G80 but then reverting with RV770 and GF100. Hadn't really thought about it that way before.
 
He described G80's TMUs as semi-decoupled.

Jawed

I didn't quite understand the distinction. It's not like R600's TMU's could service any arbitrary processor. There was a static mapping between the processors and which TMU's could serve them IIRC.
 
I didn't quite understand the distinction. It's not like R600's TMU's could service any arbitrary processor. There was a static mapping between the processors and which TMU's could serve them IIRC.


I think in sense that ATi architecture at the time, it could have any amount of TMU's regardless of the SIMD count, but the g80 the two were tied together.
 
I didn't quite understand the distinction. It's not like R600's TMU's could service any arbitrary processor. There was a static mapping between the processors and which TMU's could serve them IIRC.
RV630 has 3 SIMDs and 2 quad TUs. The static part was in terms of which quads of ALUs/register-files within a SIMD were served by which quad TU.

Jawed
 
RV630 has 3 SIMDs and 2 quad TUs. The static part was in terms of which quads of ALUs/register-files within a SIMD were served by which quad TU.

Jawed

Understood. Nvidia's TPC's achieved the same net effect though for all practical configurations. AMD's earlier approach would bear fruit if they decided to scale up just texture units dramatically and in arbitrary ratios to the SIMDs but that didn't happen.

I don't know how inefficient those setups were though. Does RV770 have considerably better texturing efficiency than RV670?
 
Understood. Nvidia's TPC's achieved the same net effect though for all practical configurations.
It is similar conceptually, but in R600 all SIMDs are dependent upon all TUs. G80 etc. have a very strong locality.

AMD's earlier approach would bear fruit if they decided to scale up just texture units dramatically and in arbitrary ratios to the SIMDs but that didn't happen.
For whatever reason ATI hasn't gone past 4:1 ALU:TEX. I can imagine that will happen some day, but I can't tell if the current architecture is a firm limitation. Otherwise, there doesn't seem to be much need for a dramatic scaling upwards in TUs.

I don't know how inefficient those setups were though. Does RV770 have considerably better texturing efficiency than RV670?
"Efficiency" in what terms?

Per unit RV670 is faster at fp16 filtering. Per mm² for bog standard 8-bit integer texture formats R700 is more efficient - but R700 is slower for fp16/fp32 filtering per unit.

I remember some tests seemed to show some corner cases on RV770 which appeared to perform better than RV670 - the dependent texturing in some FutureMark tests for example. I have the feeling that the extra latency of texturing in RV670 due to the non-local texturing was a factor. But who knows?

HD3870's fundamental failing was performance per mm² - slightly slower than 9600GT (both are about the same size on 55nm) - and most of the blame appears to be lack of z rate, not texturing throughput. Actual die size also appears to be a victim of the ring bus, but that's pretty fuzzy.

Jawed
 
For whatever reason ATI hasn't gone past 4:1 ALU:TEX. I can imagine that will happen some day, but I can't tell if the current architecture is a firm limitation. Otherwise, there doesn't seem to be much need for a dramatic scaling upwards in TUs.

Jawed

GF100 is now 8:1 ALU:TEX and thats quite a change from gt200 with 3:1. Anyway if u would count max theoretical flops against texels/s than things would quite change.
4870 had 1008 Gflops : 27.2 GTexels , gtx285 had 1063 Gflops : 51 GTexels. So in fact GF100 is just coming closer to radeons with ALU:TEX ratio. Probably the gt200 cards couldnt use those 80 TMUs very effectively in real games anyway and was quite a waste.
 
Probably the gt200 cards couldnt use those 80 TMUs very effectively in real games anyway and was quite a waste.
That must have been what they concluded. It may be that in the time since the release of the G80, shaders in games have changed sufficiently that their architecture was no longer put to as optimal use as they would like, hence the increase in shader processing power compared to texture filtering power.
 
Hello All!

Its my first post here at Beyond 3D, although being a lurker for many time.
Im going here on a limb and guess GF100 is NOT same chip as FERMI compute.
Why?

Because of these posts at Xtreme Systems:

http://www.xtremesystems.org/Forums/showpost.php?p=4243892&postcount=458

http://www.xtremesystems.org/Forums/showpost.php?p=4244268&postcount=491

http://www.xtremesystems.org/Forums/showpost.php?p=4244316&postcount=498

Gemini is always viewed as Dual Fermi. But Gemini is also word for Twin. And there are false twins. So my guess is GF100 is GT300 twin, but not at 100%. Thats why they were presented at two different time. And internally at nVIDIA it might be the word for GF100.

With this in mind:
- GF100 might really be a smaller chip. Remember the guy that claimed to be an ex-nvidia employee said GF100 doesnt like to be called fatty ;)
- Maybe software is really what is delaying GF100 launch. Paralelizing geometry probably is not easy.

Cheers
 
There would be limits to what could be expected for TMU and ROP throughput gains when memory bandwidth grew modestly over GT200.
If one were to expect gains anywhere, it would either be in places where this is not as great a problem, or the TMUs and ROPs are made more efficient within the scope of the peak numbers provided, which Nvidia claims to have done.
 
Back
Top