NVIDIA GF100 & Friends speculation

There could be a LOT different for NVIDIA considering the hot-clocked ALUs as a simple example. Different architectures even more simple.

I don't follow, here. I really don't see how different GPU architectures would affect how much more power would be required for a dual-GPU board. Well, maybe a few things related to board-level design, but hot-clocked ALUs?

As for the Evergreen comparisons I'll twist it the other way around: if a 2GB/ 5870@850MHz has a TDP of 228W if I'd add to that one a theoretical 50% I'll end up at 342W. Minus the 14.7% frequency difference = 292W

That's the catch with dumb layman's math exactly: all roads can lead to Rome :p

But when you start comparing single and dual-GPU boards with different clocks, you throw things off because of potential voltage differences. And I don't think looking at the 2GB HD 5870 is a good idea, because that's not relevant to the GTX 460/dual-GF104 case, where the amount of memory would double.

When estimating/speculating like this, as a general rule, you want as few different parameters as possible, and in that respect the HD 5850 is, in my opinion, the best choice: apart from a factor of 2 on just about everything you can double on a board, the only difference on the HD 5970 is 160 more SPs enabled on each GPU.
 
There could be a LOT different for NVIDIA considering the hot-clocked ALUs as a simple example. Different architectures even more simple.

As for the Evergreen comparisons I'll twist it the other way around: if a 2GB/ 5870@850MHz has a TDP of 228W if I'd add to that one a theoretical 50% I'll end up at 342W. Minus the 14.7% frequency difference = 292W

That's the catch with dumb layman's math exactly: all roads can lead to Rome :p

If you're referring to the HD 5870 Eyefinity Six then its not only the additional 1 GB of memory which adds to the TDP, the card's TDP is also rated while using all six display outs vs a normal HD 5870 which is rated while using three display outs. Only Dave will be able to tell us how much extra power enabling three more display outs takes.

And as mentioned a significant factor while selecting chips for mGPU is binning. A reduction in voltage by 10% allows a reduction in power by 21%. If the dual GPU card is using a single fan it saves power there as well compared to two single cards(Maybe 5-10 watts for the fan alone). PLX chip probably does not consume more than a few watts. Also only one of the GPU's has to drive the displays i think which again saves a few watts
 
If you're referring to the HD 5870 Eyefinity Six then its not only the additional 1 GB of memory which adds to the TDP, the card's TDP is also rated while using all six display outs vs a normal HD 5870 which is rated while using three display outs. Only Dave will be able to tell us how much extra power enabling three more display outs takes.

And as mentioned a significant factor while selecting chips for mGPU is binning. A reduction in voltage by 10% allows a reduction in power by 21%. If the dual GPU card is using a single fan it saves power there as well compared to two single cards(Maybe 5-10 watts for the fan alone). PLX chip probably does not consume more than a few watts. Also only one of the GPU's has to drive the displays i think which again saves a few watts

That's exactly the reason why I call it dumb layman's math, because there are a ton of factors that one would have to regard and it's not such a simple equation as any of the above.

But when you start comparing single and dual-GPU boards with different clocks, you throw things off because of potential voltage differences. And I don't think looking at the 2GB HD 5870 is a good idea, because that's not relevant to the GTX 460/dual-GF104 case, where the amount of memory would double.

When estimating/speculating like this, as a general rule, you want as few different parameters as possible, and in that respect the HD 5850 is, in my opinion, the best choice: apart from a factor of 2 on just about everything you can double on a board, the only difference on the HD 5970 is 160 more SPs enabled on each GPU.

When you have a 160W TDP as a starting point like with a GTX460 I don't see where the whole impossible thing actually lies to get out of that thing with slightly higher clocks a GX2 which is still within the 300W barrier. If the TDP would be =/>180W I wouldn't have a single reason not to agree with you. In the meantime rumors are increasing that their planned GX2 might have been canceled, which makes this debate sillier than it already is.

For all its worth the GTX460 was released in July (?) and unless only press samples were on A1 and commercially available chips where A2 after all, who says that with a metal-spin they couldn't have had chances for even better parameters? Or in the end that thing is as problematic as GF100 and all 8 SMs can't be enabled without problems. From what I recall the GTX295 chips had a B3 stamped on them, while the earlier GTX260/216 (GT200@55nm) were B2 chips with a 171W TDP@576/1242/896. Now fiddle around with it as much as you please 289 isn't twice as much as 171.

If that table: http://en.wikipedia.org/wiki/GeForce_200_Series is correct then one of the results going to B3 was a slight reduction in die area.
 
When you have a 160W TDP as a starting point like with a GTX460 I don't see where the whole impossible thing actually lies to get out of that thing with slightly higher clocks a GX2 which is still within the 300W barrier. If the TDP would be =/>180W I wouldn't have a single reason not to agree with you. In the meantime rumors are increasing that their planned GX2 might have been canceled, which makes this debate sillier than it already is.

GTX460 hits a 180W TDP when it's clocked around 800 MHz (OC models from 780 to 810 MHz)
 
GTX460 hits a 180W TDP when it's clocked around 800 MHz (OC models from 780 to 810 MHz)

I never proposed anything above 725MHz if you go back to the former page. Wouldn't you say that there's quite a difference between a ~7% frequency increase and those 19% above which equal to 800MHz?
Besides as I said where's any guarantee that they still would have gotten to a GX2 with the same A1 chip?

How about another perspective: a 104/GX2 would had been only really worth it if it would had ended up about 25-30% faster than a GTX480. With 2*8SMs and a frequency slightly above 700MHz I'd say it's feasible. Anything less sounds like nonsense to me.
 
I never proposed anything above 725MHz if you go back to the former page. Wouldn't you say that there's quite a difference between a ~7% frequency increase and those 19% above which equal to 800MHz?
Besides as I said where's any guarantee that they still would have gotten to a GX2 with the same A1 chip?

How about another perspective: a 104/GX2 would had been only really worth it if it would had ended up about 25-30% faster than a GTX480. With 2*8SMs and a frequency slightly above 700MHz I'd say it's feasible. Anything less sounds like nonsense to me.

I think 25~30% faster is overly optimistic: Damien recorded a SLI of stock 1GB GTX 460s as 14.4% faster than a GTX 480 at most (1920×1200, and FSAA 8X) and more like 10~12% on average. You'd need at least 12% higher clocks to meet that target, likely more, especially considering that (on Evergreen at least) dual-GPU cards don't seem to scale quite as well as a real dual-card setup, perhaps because of PCI-E bandwidth limitations. Granted, that may not necessarily apply to NVIDIA.

Of course, with super-strict binning, anything's possible… but what about volume? And at what price?
 
I never proposed anything above 725MHz if you go back to the former page. Wouldn't you say that there's quite a difference between a ~7% frequency increase and those 19% above which equal to 800MHz?
that was just an FYI, I think 750MHz should be doable and have decent power draw in everything but the most extreme tests.
Anyway, Discussing nvidia's TDP didn't work out the last time either, we have different opinions but were both right :)

Besides as I said where's any guarantee that they still would have gotten to a GX2 with the same A1 chip?
I am still waiting for a GF100 revision too :)

How about another perspective: a 104/GX2 would had been only really worth it if it would had ended up about 25-30% faster than a GTX480. With 2*8SMs and a frequency slightly above 700MHz I'd say it's feasible. Anything less sounds like nonsense to me.

Besides besting GF100, it also needs to be cost efficient versus Cayman. where AMD still had the performance crown with the 5970, a GF104GX2 would probably be an option versus Cayman XT.
 
I think 25~30% faster is overly optimistic: Damien recorded a SLI of stock 1GB GTX 460s as 14.4% faster than a GTX 480 at most (1920×1200, and FSAA 8X) and more like 10~12% on average. You'd need at least 12% higher clocks to meet that target, likely more, especially considering that (on Evergreen at least) dual-GPU cards don't seem to scale quite as well as a real dual-card setup, perhaps because of PCI-E bandwidth limitations. Granted, that may not necessarily apply to NVIDIA.

Of course, with super-strict binning, anything's possible… but what about volume? And at what price?

http://www.hardware.fr/articles/796-9/test-geforce-gtx-460-sli.html

If you should mean that one here, the entire test doesn't put the GTX480 in a bad light compared to a 5970 at all LOL and even worse 7SM 460's@675MHz aren't that bad against a 5970 either. The 5970 wins the power consumption test (albeit I have my own disagreement with system power measurements) but in terms of fan noise and temperatures the 460/SLi config doesn't do all that bad. Besides the benefits for SLi being measured are banking around 90% which isn't exactly small.

As for your endless number crunching I proposed a 7% higher core frequency and 8SMs/core. We've all seen far worse performance increases with refresh GPUs in the past for a rather ridiculous higher price.

I've heard the same questions about binning and volumes over and over again in the past and cost on top of that too. I guess I'd have to remind you again that each core on a 295 weighs somewhere in the 470mm2 region.

Besides besting GF100, it also needs to be cost efficient versus Cayman. where AMD still had the performance crown with the 5970, a GF104GX2 would probably be an option versus Cayman XT.

I never really would expect more than the latter; and again it would still be better than nothing.
 
http://www.hardware.fr/articles/796-9/test-geforce-gtx-460-sli.html

If you should mean that one here, the entire test doesn't put the GTX480 in a bad light compared to a 5970 at all LOL and even worse 7SM 460's@675MHz aren't that bad against a 5970 either. The 5970 wins the power consumption test (albeit I have my own disagreement with system power measurements) but in terms of fan noise and temperatures the 460/SLi config doesn't do all that bad. Besides the benefits for SLi being measured are banking around 90% which isn't exactly small.

Yes, sorry I meant to provide that link, but forgot. It's also on BeHardware.com in English. SLI has its benefits over a single-card, dual-GPU solution: lower thermal density, therefore typically lower noise, especially given the GTX 460's cooling system, and possibly slightly higher scaling.

That said, if NVIDIA were to decide on a big heatsink with a bunch of heatpipes or a vapor chamber and a 90mm fan, they could achieve similar thermal and noise levels on a dual-GF104. It might be expensive, though.

As for your endless number crunching I proposed a 7% higher core frequency and 8SMs/core. We've all seen far worse performance increases with refresh GPUs in the past for a rather ridiculous higher price.

I've heard the same questions about binning and volumes over and over again in the past and cost on top of that too. I guess I'd have to remind you again that each core on a 295 weighs somewhere in the 470mm2 region.

The problem with enabling all 8SMs is that power goes up just from enabling them, although not necessarily by a lot, but it could be disproportionately high because of intra-die variation, which would appear to be an issue on Fermi. But yes, enabling the eight SM and bumping clocks by ~7% should be enough to meet your ~20% target.

As for the GTX 295, that is true, but GT200b was available with all SMs/TPCs enabled from the beginning, and at a very reasonable (considering the performance) 204W. This suggests that yields (and presumably binning) were less of an issue than they are now. More importantly perhaps, the GTX 295 was the fastest card around when it was introduced, which means that NVIDIA could price it nearly as high as they wished. But now, they have to worry about Cayman XT, which may or may not be faster than your proposed dual-GF104 solution, but most of all Antilles, which should pummel it.
 
I don't see NV having in the foreseeable future anything to battle Antilles, despite the latter probably delivering a smaller performance increase compared to 5970<->5870 exactly due to consumption headaches. Unless of course Cayman has the exact same TDP as Cypress which is a tad hard to swallow for me at this point.

The problem with enabling all 8SMs is that power goes up just from enabling them, although not necessarily by a lot, but it could be disproportionately high because of intra-die variation, which would appear to be an issue on Fermi.

I estimated around 175W TDP for a 8SM GF104@725MHz. If there's nothing wrong with a core itself I doubt the power increase for enabling 1 disabled cluster is worth mentioning.

As for the GTX 295, that is true, but GT200b was available with all SMs/TPCs enabled from the beginning, and at a very reasonable (considering the performance) 204W. This suggests that yields (and presumably binning) were less of an issue than they are now.

Why would there be any yield problems with a 360+mm2 die under 40G nowadays? The 204W stands for the GTX285; you have a 171W TDP for the GTX260/216 (same frequencies but slightly less units than the 295 chips). Add a few watts to the latter TDP and you're there and that's still quite a bit over the 160W of a GF104.

The GF104 is a mid-range and not a high end chip like the ones above (albeit admittedly heavily reduced to fit the 295 power envelope). 2*460@SLi managing to surpass a GTX480 albeit by a small margin, doesn't tell me personally that there's something wrong with GF104 but rather GF100 itself. If things wouldn't be like they are now we wouldn't even debate the possibility of taking two mid-range chips to hypothetically battle in the high end ground. Unless my memory betrays me any so far "GX2" consisted of high end chips.

Who knows if TSMC wouldn't had canceled 32nm, NV might have considered a shrink for GF100 and then go with reduced version of the result for a GX2. Right now and under the constraints of 40G it's either a 104/GX2 or an additional new chip to sustain for them a reasonable presence for the high end segment for next few quarters. Or else something until 28nm in the 2nd half of 2011 (stars aligning and TSMC permitting) can kick in.
 
I don't see NV having in the foreseeable future anything to battle Antilles, despite the latter probably delivering a smaller performance increase compared to 5970<->5870 exactly due to consumption headaches. Unless of course Cayman has the exact same TDP as Cypress which is a tad hard to swallow for me at this point.

Agreed. Well, I guess if Sideport made a comeback, it might help with scaling, but probably not by much.


I estimated around 175W TDP for a 8SM GF104@725MHz. If there's nothing wrong with a core itself I doubt the power increase for enabling 1 disabled cluster is worth mentioning.

I don't know. Dave seemed to think that GF100 might suffer from pretty severe intra-die variability, so perhaps that is true for GF104. It's really just speculation, though.


Why would there be any yield problems with a 360+mm2 die under 40G nowadays? The 204W stands for the GTX285; you have a 171W TDP for the GTX260/216 (same frequencies but slightly less units than the 295 chips). Add a few watts to the latter TDP and you're there and that's still quite a bit over the 160W of a GF104.

Well, for one thing, there is still no full version of GF104, almost four months after launch. So that's worrying. Plus, so far NVIDIA has displayed a level of mastery of TSMC's 40nm process that is rather… underwhelming.

According to TechPowerUp, the GTX 460 only manages a ~27% increase in perf/W over the GTX 260-216. For comparison, the HD 5850 manages almost +79% over the HD 4830, and even more over other RV770-based SKUs. All those figures are for 1920×1200.

I guess it could be the architecture, but I think NVIDIA's physical implementation for 40nm plays a part at the very least. It's worth noting that RV770 wasn't very power-efficient, though, so that makes AMD's progress somewhat artificially more significant.

The GF104 is a mid-range and not a high end chip like the ones above (albeit admittedly heavily reduced to fit the 295 power envelope). 2*460@SLi managing to surpass a GTX480 albeit by a small margin, doesn't tell me personally that there's something wrong with GF104 but rather GF100 itself. If things wouldn't be like they are now we wouldn't even debate the possibility of taking two mid-range chips to hypothetically battle in the high end ground. Unless my memory betrays me any so far "GX2" consisted of high end chips.

Yep, I think that would be a first.

Who knows if TSMC wouldn't had canceled 32nm, NV might have considered a shrink for GF100 and then go with reduced version of the result for a GX2. Right now and under the constraints of 40G it's either a 104/GX2 or an additional new chip to sustain for them a reasonable presence for the high end segment for next few quarters. Or else something until 28nm in the 2nd half of 2011 (stars aligning and TSMC permitting) can kick in.

I think a new GPU would make more sense, maybe a scaled-up GF104 with 3GPCs…
 
According to TechPowerUp, the GTX 460 only manages a ~27% increase in perf/W over the GTX 260-216. For comparison, the HD 5850 manages almost +79% over the HD 4830, and even more over other RV770-based SKUs. All those figures are for 1920×1200.

I guess it could be the architecture, but I think NVIDIA's physical implementation for 40nm plays a part at the very least. It's worth noting that RV770 wasn't very power-efficient, though, so that makes AMD's progress somewhat artificially more significant.

Funny how you compare the 5850 to the 4830 for perf/w increase, but fail to mention how powerful the 4830 and 4770 were.

http://www.techpowerup.com/reviews/Powercolor/HD_4890_PCS/29.html
 
Uh? I chose the most power-efficient version of RV770 to further prove my point that Cypress managed a much greater increase in power-efficiency over RV770 than GF104 did over GT200b.

nvm,
Was just showing you made that claim invalid by saying that RV770 was not power efficient.

So
A: 4830 was the most perf/W efficient card (your statement A)
B: RV770 was not perf/W efficient (while numbers show otherwise) (your statement B).. that confused me.
 
nvm,
Was just showing you made that claim invalid by saying that RV770 was not power efficient.

So
A: 4830 was the most perf/W efficient card (your statement A)
B: RV770 was not perf/W efficient (while numbers show otherwise) (your statement B).. that confused me.

Yes, that was a bit hasty on my part. RV770 proved to be quite power-efficient in the HD 4830 and even 4850 (so roughly 90~125W), but less so in higher parts of the performance-power spectrum (HD 4870 and 4890, even though that was actually RV790). In contrast, GT200b was really at home around 150~200W and displayed very decent power-efficiency.
 
I think a new GPU would make more sense, maybe a scaled-up GF104 with 3GPCs…

Depends on how much time they really had. I doubt it'll be a scaled up 104, since I'd expect that if they use all "104 ideas" for the high end it'll most likely happen later on.

Well, for one thing, there is still no full version of GF104, almost four months after launch. So that's worrying.
And what would they do with all the GTX470 inventory? Breed on it? :LOL:

I guess it could be the architecture, but I think NVIDIA's physical implementation for 40nm plays a part at the very least.
Time is a major constraint as everyone knows with those things. However making once a mistake is dumb, while making the same twice is dumbest. The smaller GF10x variants appearing on A1 could indicate that things have gotten a lot better then they were at the start.
 
Back
Top