AMD: R9xx Speculation

That's a very interesting claim, 'cos people like me don't upgrade ev'ry generation. So, this "power" will be helpful for the upcoming game releases.
We went ~8 years with one tri per clock (I think R300 was the first with that), so I don't think 2 tris per clock is going to be a problem anytime soon. Considering setup scaling issues and lower core clock speed of GF100/110, Cayman's real world disadvantage should be minimal.
 
Actually, Fermi has full rate 32-bit int add operations. I just wrote a CUDA kernel to test it out on my GTX 480, and got 644 Giga integer adds/second. The full-rate peak would be 1.4 GHz * 480 SMs= 672 Giga integer adds/second.

Trying the same kernel out with 32-bit int mul operations gave 331 Giga integer muls/second, which does appear to be half rate.
That's pretty interesting (though not really thread-relevant I admit), contradicting these results http://www.beyond3d.com/content/reviews/55/11 - would also mean the conclusion how int math is implemented would be wrong.
 
Are those different?

I don't know that for a fact, but since the presentation doesn't give detailed specs (no exact shader count, BW, clocks, power) I'm thinking there might be two distinct dates, one for an architecture reveal and one for benchmarks and precise specs.

It could be just AMD trying to avoid leaks, though.

Edit: oh, just saw Neliz's post.
 
It appears pretty simple actually:

the leaked presentation is dated October 2010 at the bottom, with NDA date of 11/22

We know that things were delayed/pushed back, and the NDA was apparently pushed as well. If we look at the leaked 6990 slide, if true, they held another presentation November 18, 2010... where the specs were probably revealed.

So the TBD specs were likely meant to be disclosed far later (11/18) near launch, but since the cards were pushed back a couple of weeks, the official specs and changes remain under NDA
 
How and why would they make such a mistake?
Because the slides were put together in a hurry?
Why do they contain the same error (actually is it open to interpretation, it's just not precise) as the Cypress launch presentation slides (implying it can do 2 DP muls, while it can do really 2 adds or 1 mul in DP)? That was only corrected in later slides, but obviosly they used some cut'n paste from older ones.
They use the Mantissa part of the FP unit only can do 24bit INT unless they have 48bit FP capability.
FMA?
 
Remember that single precision FMA only exists where double precision exists (which always has FMA).

Separately my theory is that 32-bit int ADD per lane is possible because exponent processing requires addition, so the combination of mantissa and exponent processing delivers the requisite 32-bit capability for integer ADD (with a bit carried from mantissa into exponent, i.e exponent handles the upper 8 bits).
 
1000 transistors each? :rolleyes: I don't think so.
When all the big space consumers (data routing, flow control, registers, pipelining, etc) are already in place, and you have 24 bit adders, yeah, marginal cost should be less than that. 8 more full adders will need under 150 trannies, and since it's got as much time to finish as a FP32 MAD you don't need any carry lookahead.

Raw math is a lot cheaper than you think.
 
Well, the doubled triangle setup rate was more or less expected -- probably nothing less that that, in the face of Fermi's geometry showcase. But I guess the 32nm cancellation broke a lot of the more optimistic expectations across the board.
 
I don't think both AMD and nVidia can rely forever on pouring more and more raw power and on new technology processes. So they have to improve their architectures in order to offer more performance. According to me Cayman is what is expected paper specification wise, but, I honestly hope that it will not show disappointing* real world performance...


*something similar to the one-year-old Hemlock. :devilish:
 
Pressure: considering the 6990 slide is a fake, it isn't publicly known how fast GDDR5 modules will be used...

Megadrive1988: Improved ROPs, CSAA and power management at this level are all unexpected things...
"Not-decoupled" TMUs means, that the GPU hasn't 5:1 ALU:TEX, so it will have more texturing power, than expected. I'd say it's quite positive - not breathtaking, but slightly more positive, than expected.
 
Slightly off, but, do we expect Cayman to be able to run 3D Mark 2011 smoothly? And, one more thing. WTH does that stupid physics test in it? Is it strictly nVidia oriented or this physics is anything different?
 
Back
Top