NVIDIA GF100 & Friends speculation

rpg.314 · Mar 3, 2010

3200 MHz seems pretty low even for the salvage part considering 5870 runs it at 4800MHz.

GZ007 · Mar 3, 2010

rpg.314 said:
3200 MHz seems pretty low even for the salvage part considering 5870 runs it at 4800MHz.

The 2*6pin conectors and 225W limit could be the reason :?:

But that would be a very lame reason

Rangers · Mar 3, 2010

digitalwanderer said:
Is it just me or do those numbers almost look like they may just possibly be real?

If they are, Nvidia is in a whole lot of trouble. 470 would have to be priced at 299 (which was rumored?).

Rangers · Mar 3, 2010

DegustatoR said:
That'd certainly be a great achivement if the performance delta stays the same or goes up this generation. And AMD's PR been using their die sizes comparisions for several years now, yeah.

If those 470 benches are real, the performance delta would have just decreased by quite a lot, if not disappeared.

XMAN26 · Mar 3, 2010

ShaidarHaran said:
The post to which you're referring was a guess on his part, an unlikely one at that. We haven't seen ROPs clocked @ only 475MHz since the days of NV40.

GTX 470, potentially a $500 part, will not be out-performed by GTX 280, a part launched 21 months ago. It is illogical.

My post was based on Info I stand behind at the time. It is at all possible they increased the clocks for the ROPs. But at the time I posted, that was the info I had been given.

FrameBuffer · Mar 3, 2010

GZ007 said:
The 2*6pin conectors and 225W limit could be the reason
But that would be a very lame reason

could be a means to creating a limit to help differentiate the two GF100 products, after all who in their right mind would pay a premium for a top tier card when a lesser (lower cost) one could easily meet or match performance at a substantial savings ? maybe bu limiting available OCing (and thus mem bandwidth) the performance delta between the products will be well defined (enough to justify the cost increase)

ShaidarHaran · Mar 3, 2010

rpg.314 said:
3200 MHz seems pretty low even for the salvage part considering 5870 runs it at 4800MHz.

I think it's low too, but remember GF100 has more memory and a wider memory interface than Cypress so I wouldn't be surprised if the memory clock is somewhat lower. 50% is a bit much though.

ShaidarHaran · Mar 3, 2010

XMAN26 said:
My post was based on Info I stand behind at the time. It is at all possible they increased the clocks for the ROPs. But at the time I posted, that was the info I had been given.

As I said already, 475MHz is not a logical clockspeed for 2010-era ROPs on an IHV's Flagship SKU's salvage part.

XMAN26 · Mar 3, 2010

ShaidarHaran said:
As I said already, 475MHz is not a logical clockspeed for 2010-era ROPs on an IHV's Flagship SKU's salvage part.

Which is true, but again, I mearly posted to clarify I wasn't guessing but was going off info I had been given. Nice to see they did get the clocks up tho.

XMAN26 · Mar 3, 2010

ShaidarHaran said:
I think it's low too, but remember GF100 has more memory and a wider memory interface than Cypress so I wouldn't be surprised if the memory clock is somewhat lower. 50% is a bit much though.

Depends on which way you look at it. Its 50% going from 3200 to 4800 but only 33% going from 4800 to 3200.

ShaidarHaran · Mar 3, 2010

XMAN26 said:
Which is true, but again, I mearly posted to clarify I wasn't guessing but was going off info I had been given. Nice to see they did get the clocks up tho.

It's possible those were the clocks A1 was hitting, but I don't think anyone expected A1 to be shipping silicon, especially not after hitting those clocks.

XMAN26 said:
Depends on which way you look at it. Its 50% going from 3200 to 4800 but only 33% going from 4800 to 3200.

True.

XMAN26 · Mar 3, 2010

ShaidarHaran said:
It's possible those were the clocks A1 was hitting, but I don't think anyone expected A1 to be shipping silicon, especially not after hitting those clocks.

Poosibly A2 aswell. The COmputer show in January was reported as seeing a few Fermi cards running the sled demo but the frame rates got real low at times.

neliz · Mar 3, 2010

XMAN26 said:
Poosibly A2 aswell. The COmputer show in January was reported as seeing a few Fermi cards running the sled demo but the frame rates got real low at times.

seeing this article from Hexus, could it be possible that it wasn't running on GF100?

(text is below the video)

http://www.hexus.net/content/item.php?item=22702

ShaidarHaran · Mar 3, 2010

XMAN26 said:
Poosibly A2 aswell. The COmputer show in January was reported as seeing a few Fermi cards running the sled demo but the frame rates got real low at times.

Which doesn't necessarily mean that clocks were inadequate, that could very well be a function of early drivers.

Sontin · Mar 3, 2010

XMAN26 said:
Poosibly A2 aswell. The COmputer show in January was reported as seeing a few Fermi cards running the sled demo but the frame rates got real low at times.

The Rocket Sled demo is a performance monster - Tessellation and Physics at the same time. The only time the fps got real low was in the wireframe mode.

neliz said:
seeing this article from Hexus, could it be possible that it wasn't running on GF100?

(text is below the video)

http://www.hexus.net/content/item.php?item=22702

Only the raytracing demo runs on a GT200, too.

GZ007 · Mar 3, 2010

FrameBuffer said:
could be a means to creating a limit to help differentiate the two GF100 products, after all who in their right mind would pay a premium for a top tier card when a lesser (lower cost) one could easily meet or match performance at a substantial savings ? maybe bu limiting available OCing (and thus mem bandwidth) the performance delta between the products will be well defined (enough to justify the cost increase)

Yep , you could be right. With the 225W they will limit the GPU clocks and with the 3200 MHz low voltage gddr5 they could limit the memory bandwith.

mczak · Mar 3, 2010

FrameBuffer said:
you know better than use a single metric such as memory bandwidth to compare different products using different archs.. /slap

In all honesty though you can use just about any single metric and the gtx470 would do better in practice than what the theoretical figures suggest. As already mentioned, quite a bit less bandwidth (though I'm not convinced on the 800Mhz GDDR5 yet). The figures floating around for rop clocks seem to be sketchy, but even assuming 600Mhz it's got less raw rop throughput than a HD5870 (ok not by that much - inline with those performance numbers). Texturing? Only half the theoretical texture fill rate. ALUs? Less than half raw throughput.
It isn't surprising (GF100 really needs to achieve more of its theoretical potential compared to cypress, otherwise it would be a horrendous disaster), plus Cypress doesn't fare well on that metric compared to Juniper neither. Of course, in the end this metric isn't really relevant at all...

rpg.314 · Mar 3, 2010

At this point, I need more benches to be convinced that these ones are real and representative.

Sontin · Mar 3, 2010

mczak said:
In all honesty though you can use just about any single metric and the gtx470 would do better in practice than what the theoretical figures suggest. As already mentioned, quite a bit less bandwidth (though I'm not convinced on the 800Mhz GDDR5 yet). The figures floating around for rop clocks seem to be sketchy, but even assuming 600Mhz it's got less raw rop throughput than a HD5870 (ok not by that much - inline with those performance numbers). Texturing? Only half the theoretical texture fill rate. ALUs? Less than half raw throughput.
It isn't surprising (GF100 really needs to achieve more of its theoretical potential compared to cypress, otherwise it would be a horrendous disaster), plus Cypress doesn't fare well on that metric compared to Juniper neither. Of course, in the end this metric isn't really relevant at all...

So you assume that nVidia invested more than twice of the transistor for less than 50% more speed. Even a GTX295 would be faster with less transistors...

BTW: How could they build smaller chips with this kind of per/mm^2? They would slower and bigger than a G92b...

mczak · Mar 3, 2010

Sontin said:
So you assume that nVidia invested more than twice of the transistor for less than 50% more speed. Even a GTX295 would be faster with less transistors...

I didn't assume that. That's just for the theoretical figures. ALUs is what got the most increase there, and even that is "only" a roughly 100% increase (depending on clocks). (Small nitpick, it isn't really truly more than twice the transistors, since that would count disabled units too. That hasn't really anything to do with the architecture itself - plus this is gtx470 which probably has a bit less than twice the "active" transistors of a gtx285). And clearly some of the transistors invested are for dx11 features not for performance (as was the case with evergreen), and others may help performance a lot but only under limited circumstances (like the distributed geometry processing - I really wonder how expensive this was in terms of transistor count).

BTW: How could they build smaller chips with this kind of per/mm^2? They would slower and bigger than a G92b...

In terms of theoretical specs? Absolutely. But they should achieve higher performance in practice. Plus it shouldn't be really worse in terms of perf/mm^2 even in theoretical terms thanks to 40nm vs 55nm manufacturing.

edit: just to clarify the hard numbers:
g92(b) vs. gf100 (assuming full chip):
754 vs 3000 million transistors (4x)
128 vs 512 alus (4x, but actually less due to lower clock and less sfus)
64 vs 64 tmus (1x, for some things like unfiltered fp16 it's 2x, and of course should be more efficient as well)
16 vs 48 rops (3x, probably less due to lower clock)

Of course, if you look at die size instead of transistors, things get better for gf100 - but that would only be due to manufacturing differences.
Also, if you believe gf100 to be about 500mm^2, it's not really THAT big, and the smaller chips shouldn't have the problems that they always need to have disabled units (not only because they are smaller but nvidia probably will hopefully fix the design issues by the time they appear...). I don't doubt that nvidia will be able to do some half-fermi chip on 40nm with a similar die size as g92b (but apparently more transistors) which will beat the oldtimer in practical performance (as well as obviously offering more features), something they apparently couldn't do with gt2xx...

NVIDIA GF100 & Friends speculation

rpg.314

GZ007

Rangers

Rangers

XMAN26

FrameBuffer

ShaidarHaran

hardware monkey

ShaidarHaran

hardware monkey

XMAN26

XMAN26

ShaidarHaran

hardware monkey

XMAN26

neliz

GIGABYTE Man

ShaidarHaran

hardware monkey

Sontin

GZ007

mczak

rpg.314

Sontin

mczak

Similar threads