NVIDIA GF100 & Friends speculation

That'd certainly be a great achivement if the performance delta stays the same or goes up this generation. And AMD's PR been using their die sizes comparisions for several years now, yeah.

If those 470 benches are real, the performance delta would have just decreased by quite a lot, if not disappeared.
 
The post to which you're referring was a guess on his part, an unlikely one at that. We haven't seen ROPs clocked @ only 475MHz since the days of NV40.

GTX 470, potentially a $500 part, will not be out-performed by GTX 280, a part launched 21 months ago. It is illogical.

My post was based on Info I stand behind at the time. It is at all possible they increased the clocks for the ROPs. But at the time I posted, that was the info I had been given.
 
The 2*6pin conectors and 225W limit could be the reason :?:
But that would be a very lame reason :LOL:

could be a means to creating a limit to help differentiate the two GF100 products, after all who in their right mind would pay a premium for a top tier card when a lesser (lower cost) one could easily meet or match performance at a substantial savings ? maybe bu limiting available OCing (and thus mem bandwidth) the performance delta between the products will be well defined (enough to justify the cost increase)
 
3200 MHz seems pretty low even for the salvage part considering 5870 runs it at 4800MHz.

I think it's low too, but remember GF100 has more memory and a wider memory interface than Cypress so I wouldn't be surprised if the memory clock is somewhat lower. 50% is a bit much though.
 
My post was based on Info I stand behind at the time. It is at all possible they increased the clocks for the ROPs. But at the time I posted, that was the info I had been given.

As I said already, 475MHz is not a logical clockspeed for 2010-era ROPs on an IHV's Flagship SKU's salvage part.
 
As I said already, 475MHz is not a logical clockspeed for 2010-era ROPs on an IHV's Flagship SKU's salvage part.

Which is true, but again, I mearly posted to clarify I wasn't guessing but was going off info I had been given. Nice to see they did get the clocks up tho.
 
I think it's low too, but remember GF100 has more memory and a wider memory interface than Cypress so I wouldn't be surprised if the memory clock is somewhat lower. 50% is a bit much though.

Depends on which way you look at it. Its 50% going from 3200 to 4800 but only 33% going from 4800 to 3200.
 
Which is true, but again, I mearly posted to clarify I wasn't guessing but was going off info I had been given. Nice to see they did get the clocks up tho.

It's possible those were the clocks A1 was hitting, but I don't think anyone expected A1 to be shipping silicon, especially not after hitting those clocks.

Depends on which way you look at it. Its 50% going from 3200 to 4800 but only 33% going from 4800 to 3200.

True.
 
It's possible those were the clocks A1 was hitting, but I don't think anyone expected A1 to be shipping silicon, especially not after hitting those clocks.

Poosibly A2 aswell. The COmputer show in January was reported as seeing a few Fermi cards running the sled demo but the frame rates got real low at times.
 
Poosibly A2 aswell. The COmputer show in January was reported as seeing a few Fermi cards running the sled demo but the frame rates got real low at times.

Which doesn't necessarily mean that clocks were inadequate, that could very well be a function of early drivers.
 
Poosibly A2 aswell. The COmputer show in January was reported as seeing a few Fermi cards running the sled demo but the frame rates got real low at times.

The Rocket Sled demo is a performance monster - Tessellation and Physics at the same time. The only time the fps got real low was in the wireframe mode.

seeing this article from Hexus, could it be possible that it wasn't running on GF100?

(text is below the video)

http://www.hexus.net/content/item.php?item=22702


Only the raytracing demo runs on a GT200, too.
 
could be a means to creating a limit to help differentiate the two GF100 products, after all who in their right mind would pay a premium for a top tier card when a lesser (lower cost) one could easily meet or match performance at a substantial savings ? maybe bu limiting available OCing (and thus mem bandwidth) the performance delta between the products will be well defined (enough to justify the cost increase)

Yep , you could be right. With the 225W they will limit the GPU clocks and with the 3200 MHz low voltage gddr5 they could limit the memory bandwith.
 
you know better than use a single metric such as memory bandwidth to compare different products using different archs.. /slap ;)
In all honesty though you can use just about any single metric and the gtx470 would do better in practice than what the theoretical figures suggest. As already mentioned, quite a bit less bandwidth (though I'm not convinced on the 800Mhz GDDR5 yet). The figures floating around for rop clocks seem to be sketchy, but even assuming 600Mhz it's got less raw rop throughput than a HD5870 (ok not by that much - inline with those performance numbers). Texturing? Only half the theoretical texture fill rate. ALUs? Less than half raw throughput.
It isn't surprising (GF100 really needs to achieve more of its theoretical potential compared to cypress, otherwise it would be a horrendous disaster), plus Cypress doesn't fare well on that metric compared to Juniper neither. Of course, in the end this metric isn't really relevant at all...
 
In all honesty though you can use just about any single metric and the gtx470 would do better in practice than what the theoretical figures suggest. As already mentioned, quite a bit less bandwidth (though I'm not convinced on the 800Mhz GDDR5 yet). The figures floating around for rop clocks seem to be sketchy, but even assuming 600Mhz it's got less raw rop throughput than a HD5870 (ok not by that much - inline with those performance numbers). Texturing? Only half the theoretical texture fill rate. ALUs? Less than half raw throughput.
It isn't surprising (GF100 really needs to achieve more of its theoretical potential compared to cypress, otherwise it would be a horrendous disaster), plus Cypress doesn't fare well on that metric compared to Juniper neither. Of course, in the end this metric isn't really relevant at all...

So you assume that nVidia invested more than twice of the transistor for less than 50% more speed. Even a GTX295 would be faster with less transistors...

BTW: How could they build smaller chips with this kind of per/mm^2? They would slower and bigger than a G92b...
 
Last edited by a moderator:
So you assume that nVidia invested more than twice of the transistor for less than 50% more speed. Even a GTX295 would be faster with less transistors...
I didn't assume that. That's just for the theoretical figures. ALUs is what got the most increase there, and even that is "only" a roughly 100% increase (depending on clocks). (Small nitpick, it isn't really truly more than twice the transistors, since that would count disabled units too. That hasn't really anything to do with the architecture itself - plus this is gtx470 which probably has a bit less than twice the "active" transistors of a gtx285). And clearly some of the transistors invested are for dx11 features not for performance (as was the case with evergreen), and others may help performance a lot but only under limited circumstances (like the distributed geometry processing - I really wonder how expensive this was in terms of transistor count).
BTW: How could they build smaller chips with this kind of per/mm^2? They would slower and bigger than a G92b...
In terms of theoretical specs? Absolutely. But they should achieve higher performance in practice. Plus it shouldn't be really worse in terms of perf/mm^2 even in theoretical terms thanks to 40nm vs 55nm manufacturing.

edit: just to clarify the hard numbers:
g92(b) vs. gf100 (assuming full chip):
754 vs 3000 million transistors (4x)
128 vs 512 alus (4x, but actually less due to lower clock and less sfus)
64 vs 64 tmus (1x, for some things like unfiltered fp16 it's 2x, and of course should be more efficient as well)
16 vs 48 rops (3x, probably less due to lower clock)

Of course, if you look at die size instead of transistors, things get better for gf100 - but that would only be due to manufacturing differences.
Also, if you believe gf100 to be about 500mm^2, it's not really THAT big, and the smaller chips shouldn't have the problems that they always need to have disabled units (not only because they are smaller but nvidia probably will hopefully fix the design issues by the time they appear...). I don't doubt that nvidia will be able to do some half-fermi chip on 40nm with a similar die size as g92b (but apparently more transistors) which will beat the oldtimer in practical performance (as well as obviously offering more features), something they apparently couldn't do with gt2xx...
 
Last edited by a moderator:
Back
Top