AMD: R9xx Speculation

Love_In_Rio · Nov 22, 2010

So, what do you think guys? after having a glance at the tidbits of the new architecture, could this be indeed a new R300 as rumoured?.

ECH · Nov 22, 2010

Love_In_Rio said:
So, what do you think guys? after having a glance at the tidbits of the new architecture, could this be indeed a new R300 as rumoured?.

I think the performance of the cards would need to be seen 1st.

Harison · Nov 22, 2010

ECH said:
I think the performance of the cards would need to be seen 1st.

Indeed, but if AMD hasnt screwed up, its possible to extrapolate rough estimates. I would be mighty surprised if with such stats 6970 wont be faster than GTX580. While 6990 as someone joked would provide "useless" amount of firepower

AlphaWolf · Nov 22, 2010

It's not going to be an R300 because GTX580 is already out, and its not NV30.

mczak · Nov 22, 2010

hoom said:
I guess I meant 'no fullspeed 32bit INT ops'.
The slide says:
4* 24bit MUL, ADD or MAD
2* 32bit ADD
1* 32bit MUL

I was hoping they would be doing fullrate 32bit rather than ganging the SPs.
If 32bit INT isn't used all that much this should be OK though.

Don't forget even with only one 32bit int mul per clock, the absolute number is still (somewhat) higher than what a GTX580 can do (which has half-rate 32bit int rate). For 32bit int adds that's more than twice as fast as GTX 580 (unless, of course, that's scalar, in which case it'll drop to 1 32bit int add as usual). So that still looks plenty fast to me.

mczak · Nov 22, 2010

AlphaWolf said:
It's not going to be an R300 because GTX580 is already out, and its not NV30.

And it can't be a new R300 (imho) anyway, since this was such a big leap in all areas - not only performance but also feature wise. Cayman is probably a nice improvement in performance (and it could be a decent improvement in perf/w too which gets more important), but it doesn't really bring anything new to the table feature wise me thinks.

Gipsel · Nov 22, 2010

hoom said:
4* 24bit MUL, ADD or MAD
2* 32bit ADD
1* 32bit MUL

I was hoping they would be doing fullrate 32bit rather than ganging the SPs.
If 32bit INT isn't used all that much this should be OK though.

That's a mistake. It can do four 32bit integer adds per cycle. An add can be done in each VLIW slot, same as in Cypress (and everything since R600).

Edit:
ISA looks like that (no difference between different generations):

Code:

x: ADD_INT     R0.x,  R1.x,  R2.x
y: ADD_INT     R0.y,  R1.y,  R2.y
z: ADD_INT     R0.z,  R1.z,  R2.z
w: ADD_INT     R0.w,  R1.w,  R2.w

hoom · Nov 22, 2010

the absolute number is still (somewhat) higher than what a GTX580 can do (which has half-rate 32bit int rate).

Hmm, somehow I'd gotten into my mind that NV is doing fullspeed 32bit INTs

Gipsel · Nov 22, 2010

hoom said:
Hmm, somehow I'd gotten into my mind that NV is doing fullspeed 32bit INTs

That's what they said, tests have shown something else iirc.

Alexko · Nov 23, 2010

Gipsel said:
That's a mistake. It can do four 32bit integer adds per cycle. An add can be done in each VLIW slot, same as in Cypress (and everything since R600).

Edit:
ISA looks like that (no difference between different generations):

Code:

x: ADD_INT R0.x, R1.x, R2.x y: ADD_INT R0.y, R1.y, R2.y z: ADD_INT R0.z, R1.z, R2.z w: ADD_INT R0.w, R1.w, R2.w

Well, I hadn't even noticed that one! If it can do 4 FMAs/cycle, there's really no reason it shouldn't be able to do 4 ADDs, anyway.

By the way, does anyone know when the NDA actually expires? I mean the NDA for this presentation, not benchmarks.

ECH · Nov 23, 2010

Harison said:
Indeed, but if AMD hasnt screwed up, its possible to extrapolate rough estimates. I would be mighty surprised if with such stats 6970 wont be faster than GTX580. While 6990 as someone joked would provide "useless" amount of firepower

That's my thinking as well. It appears that the 6950 should be on par (win some/loss some) with the competing current gen card. But that remains to be seen. Another contention is the improved IQ with MLAA along with EQAA and what kind of performance one will get with those cards.

digitalwanderer · Nov 23, 2010

Alexko said:
By the way, does anyone know when the NDA actually expires? I mean the NDA for this presentation, not benchmarks.

Unfortunately the people who know that also couldn't tell us if they wanted to, NDAs are tricky that way.

eastmen · Nov 23, 2010

UniversalTruth said:
That's a very interesting claim, 'cos people like me don't upgrade ev'ry generation. So, this "power" will be helpful for the upcoming game releases.

Future proofing doesn't exist. GF 1x0 may be fast with tessellation but in upcoming games another part of the hardware may be bottleknecked in thos egames resulting in the game performing as badly as the cypress platform might. Or cypress leading in other areas of performance might pull away.

Anyway there is a thread on tessellation

mczak · Nov 23, 2010

Gipsel said:
That's what they said, tests have shown something else iirc.

Oh you're right. The whitepaper said only DP can't be dual issued but two int instructions can. Either that's just not true or it could be artificially limited for consumer parts?

caveman-jim · Nov 23, 2010

Alexko said:
By the way, does anyone know when the NDA actually expires? I mean the NDA for this presentation, not benchmarks.

Are those different?

hoom · Nov 23, 2010

That's a mistake. It can do four 32bit integer adds per cycle.

How and why would they make such a mistake?
They use the Mantissa part of the FP unit only can do 24bit INT unless they have 48bit FP capability.

EduardoS · Nov 23, 2010

RecessionCone said:
I know in compute, 32bit int is used often for indices into data structures.

And in many cases for the entire kernel, only half rate 32 bits add is unacceptable, fortunally it's not the case.

32 bit mul at quarter rate is ok but not good, at half rate would be good, since the hardware is capable of 52 bits multiplies at quarter rate couldn't it be a little modified to allow 32 bit mul at half? :smile:

EduardoS · Nov 23, 2010

hoom said:
How and why would they make such a mistake?
They use the Mantissa part of the FP unit only can do 24bit INT unless they have 48bit FP capability.

With FMA the need 48 bit adders for correct results.

Anyway, 32 bits adders are way to cheap to not include...

RecessionCone · Nov 23, 2010

mczak said:
Don't forget even with only one 32bit int mul per clock, the absolute number is still (somewhat) higher than what a GTX580 can do (which has half-rate 32bit int rate). For 32bit int adds that's more than twice as fast as GTX 580 (unless, of course, that's scalar, in which case it'll drop to 1 32bit int add as usual). So that still looks plenty fast to me.

Actually, Fermi has full rate 32-bit int add operations. I just wrote a CUDA kernel to test it out on my GTX 480, and got 644 Giga integer adds/second. The full-rate peak would be 1.4 GHz * 480 SMs= 672 Giga integer adds/second.

Trying the same kernel out with 32-bit int mul operations gave 331 Giga integer muls/second, which does appear to be half rate.

hoom · Nov 23, 2010

With FMA the need 48 bit adders for correct results.

Ahh, but still why would ATI be quoting 24bit only?

Anyway, 32 bits adders are way to cheap to not include...

Perhaps for a few but when you have 1920 of them?

AMD: R9xx Speculation

Love_In_Rio

ECH

Harison

AlphaWolf

Specious Misanthrope

mczak

mczak

Gipsel

hoom

Gipsel

Alexko

ECH

digitalwanderer

eastmen

mczak

caveman-jim

hoom

EduardoS

EduardoS

RecessionCone

hoom

Similar threads