NVIDIA GF100 & Friends speculation

Yes, but they haven't spent transistors useless to graphics on this scale before. G80/GT200 were monsters, but had very little area devoted to stuff which graphics doesn't need.
Performance per mm² of GT200 on graphics is not competitive. Some of that, arguably, is down to lack of GDDR5 (i.e. efficiency gains that accrue from higher bandwidth per pin), but it's still way wide of the mark it needs to be to be competitive.

Indeed GF100 seems likely to be more efficient per mm² of excess over Cypress than GT200b is over RV790 - i.e. the die size difference between them has shrunk this generation but the performance advantage for NVidia's architecture remains the same or higher. That could be a direct result of GDDR5, but I expect architectural efficiency plays a significant part - and the joker's still in play...

I can't see why you'd disable half your TMU's for the top dog. This rumour doesn't make any kind of sense from any angle.
Rumours of GTX480 with only 480 MAD ALUs active didn't make any sense either (if the chip is "working fine", there's no way that such a SKU would have been planned). Yet they were taken seriously.

Jawed
 
My first guess would be that it is a bug in code, instead of a precision problem. But that's just me.
To give you an example of how this sort of thing works out, consider the following example:

a = 1001.003
b = 1000.0

Now, these two numbers are represented in single-precision as:
a = 1001.00299072265625
b = 1000

b is basically exact, but the number a is off out at the seventh decimal place.

But what happens if I subtract them?

a - b = 1.00299072265625

Now my answer is only accurate out to four decimal places. If, by contrast, I perform the calculation in double precision, the answer remains accurate out to 13 decimal places.
 
To give you an example of how this sort of thing works out, consider the following example:

a = 1001.003
b = 1000.0

Now, these two numbers are represented in single-precision as:
a = 1001.00299072265625
b = 1000

b is basically exact, but the number a is off out at the seventh decimal place.

But what happens if I subtract them?

a - b = 1.00299072265625

Now my answer is only accurate out to four decimal places. If, by contrast, I perform the calculation in double precision, the answer remains accurate out to 13 decimal places.

Besides this, there is also the issue fo dealing with numbers very close to zero AND the fact that the mantissas are normalized so precission may vary depending how close you are to a power of 2.

Edit: close to zero, not small.
 
Hmm germany didn't really have two fronts, its more of one front with two focal points, since both battles actually have similar goals at the end which is more land for germany to lose.

you CAN always try to spin.


Not exactly the same hardware units are doing 2 different things, Germany had two separate campaigns with different military divisions which caused resources to be spread thin. Logistics of it the nv30 had two fronts because it had 2 seperate hardware portions that were specialized for two different tasks (fixed function and programmable shader units).

YOU can spin all you want, Fermi was made for gaming and GPGPU is an extension of gaming with extensibility to other areas as much as you don't want it to be.
 
Is it not a discrete unit in the alus? So, they can easily cut it off from the low and mid range parts.
This was all? :smile:
I don't think there would be anything easy about cutting the int units out. They might do it, but it would be a significant investment in resources, as well as confusing to developers.
 
Performance per mm² of GT200 on graphics is not competitive. Some of that, arguably, is down to lack of GDDR5 (i.e. efficiency gains that accrue from higher bandwidth per pin), but it's still way wide of the mark it needs to be to be competitive.

Indeed GF100 seems likely to be more efficient per mm² of excess over Cypress than GT200b is over RV790 - i.e. the die size difference between them has shrunk this generation but the performance advantage for NVidia's architecture remains the same or higher. That could be a direct result of GDDR5, but I expect architectural efficiency plays a significant part - and the joker's still in play...
The efficiency gains from caches are a one time gain, just like gddr5 was. When AMD builds a cache hierarchy, those relative gains will be lost. The int bloat will remain....

Overall, the gap has narrowed, but it is still substantial. And int mul doesn't appear to be helping. Assuming nv gets rid of the the int mul in mainstream parts, the efficiency gap could probably be closed.

Rumours of GTX480 with only 480 MAD ALUs active didn't make any sense either (if the chip is "working fine", there's no way that such a SKU would have been planned). Yet they were taken seriously.
In the defense of those who spread it/believed it, 480 is >90% of 512. No, I wasn't one of them.
 
Is it not a discrete unit in the alus? So, they can easily cut it off from the low and mid range parts.
May be. Personally, I am doubtful if the mainstream will lose the int mul. At any rate, it is a noticeable and unnecessary expense for graphics, even if only gf100 suffers from it.
 
May be. Personally, I am doubtful if the mainstream will lose the int mul. At any rate, it is a noticeable and unnecessary expense for graphics, even if only gf100 suffers from it.

To remove that "bloat", Tesla and GeForce would have to use entirely different chips, which is far from cost effective and this is a money making company we are talking here.
 
The efficiency gains from caches are a one time gain, just like gddr5 was. When AMD builds a cache hierarchy, those relative gains will be lost. The int bloat will remain....

Overall, the gap has narrowed, but it is still substantial. And int mul doesn't appear to be helping. Assuming nv gets rid of the the int mul in mainstream parts, the efficiency gap could probably be closed.


In the defense of those who spread it/believed it, 480 is >90% of 512. No, I wasn't one of them.

Man I can't wait for the 26th so these 480 rumors will finially die. 512SP = GTX480, not 480. Who ever started that should be hunt down, oh I think it was Charlie.
 
May be. Personally, I am doubtful if the mainstream will lose the int mul. At any rate, it is a noticeable and unnecessary expense for graphics, even if only gf100 suffers from it.
Another point is that nVidia typically makes workstation versions of their lower-end parts as well, where it would make very little sense to drop int support.
 
Man I can't wait for the 26th so these 480 rumors will finially die. 512SP = GTX480, not 480. Who ever started that should be hunt down, oh I think it was Charlie.


Good, then whomever is wrong should have to eat their shorts in a public viewing posted to a worldwide media outlet

edit; more on topic.. but IF the part is indeed 512 Cuda Cores and only manages to just squeak past the 5870 (+10%) then I'd say thats even worse of an outlook than delivering "480SP" parts.. ouch if indeed true.
 
To remove that "bloat", Tesla and GeForce would have to use entirely different chips, which is far from cost effective and this is a money making company we are talking here.

Which is why I can't see the point of of doing this if the primary focus of gf100 was gaming? Otherwise, it makes sense.
 
Which is why I can't see the point of of doing this if the primary focus of gf100 was gaming? Otherwise, it makes sense.

Because nVidia will sell GF100 as Geforce, Quadro and Tesla. Why should they do this work with the geometry and tessellation performance when the primary purpose would not be the gaming market?
 
Because nVidia will sell GF100 as Geforce, Quadro and Tesla. Why should they do this work with the geometry and tessellation performance when the primary purpose would not be the gaming market?

Under that assumption, gf100 is there to do the graphics R&D, and sell mainly as Quadro and Tesla.

Of course, you are free to present your own hypothesis regarding the quantity and quality of gf100's int implementation and it's relation to nv's market strategy.
 
To give you an example of how this sort of thing works out, consider the following example:

a = 1001.003
b = 1000.0

Now, these two numbers are represented in single-precision as:
a = 1001.00299072265625
b = 1000

b is basically exact, but the number a is off out at the seventh decimal place.

But what happens if I subtract them?

a - b = 1.00299072265625

Now my answer is only accurate out to four decimal places. If, by contrast, I perform the calculation in double precision, the answer remains accurate out to 13 decimal places.

I dont get it. Wasnt the a = 1001.003 and after a = 1001.00299072265625.
It almost doesnt change the actual value of the number (even after hundreds of other operations) :LOL:
Also why are we chasing decimal places and accuracy (7 vs 4 :?:) when both numbers have exact same value minus the 1000 after the subtract.
And as the whole physics engine is purely fictive u have quite a freedom to change things up. Your smallest virtual physical unit could be 1 for example :rolleyes:.
 
Which is why I can't see the point of of doing this if the primary focus of gf100 was gaming? Otherwise, it makes sense.

Of doing what ? Designing a GPU that's good for both markets that they are targeting, because that's the cost effective way of doing it, instead of designing two entirely different chips, that would increase R&D costs ?

That you consider those compute specific bits "bloat" in graphics tasks, I can understand, but that you question the why they did it, I really don't...
 
Under that assumption, gf100 is there to do the graphics R&D, and sell mainly as Quadro and Tesla.

GF100 is a gaming chip for the gaming market with a few GPGPU functionality. They use one chip for three different markets. They will make the same or more money in the quadro and tesla segment than in the high-end segment in their geforce business.

Of course, you are free to present your own hypothesis regarding the quantity and quality of gf100's int implementation and it's relation to nv's market strategy.

If not necessary for gaming then they will cut it off for <GF100. They have discrete DP units in GT200 but not in their GT200 mainstream parts.
 
GF100 is a gaming chip for the gaming market with a few GPGPU functionality. They use one chip for three different markets. They will make the same or more money in the quadro and tesla segment than in the high-end segment in their geforce business.



If not necessary for gaming then they will cut it off for <GF100. They have discrete DP units in GT200 but not in their GT200 mainstream parts.

??? huh really ?
 
I dont get it. Wasnt the a = 1001.003 and after a = 1001.00299072265625.
It almost doesnt change the actual value of the number (even after hundreds of other operations) :LOL:
Also why are we chasing decimal places and accuracy (7 vs 4 :?:) when both numbers have exact same value minus the 1000 after the subtract.
The point is that let's say that I perform some calculation, and its "true" result is 1001.003. If I use single-precision floating point, I can't actually represent that number. Instead it's represented as 1001.0029907 (with more decimal places after...). This isn't very bad, because it's closer to the true result to one part in ten million (the accuracy of single-precision).

But if I subtract it with another number that is close, I end up with a result that has much worse precision.

And as the whole physics engine is purely fictive u have quite a freedom to change things up. Your smallest virtual physical unit could be 1 for example :rolleyes:.
Yeah, um, there's a reason why we use floating point numbers for physics simulations. Doing this would merely reduce the precision further.

But as I said, yes, there are a number of different ways to improve precision of a result. The difficulty is that these methods always reduce performance by some amount, and they often take developer effort as well. Exactly how much the performance is reduced could vary to a minuscule change to a factor of a few slower.

Going to double precision is nice because it causes a dramatic increase in precision with essentially zero developer effort. There is a significant performance hit, of course, but the benefit is that the developer doesn't have to worry as much about precision.
 
Back
Top