Love_In_Rio
Veteran
So, what do you think guys? after having a glance at the tidbits of the new architecture, could this be indeed a new R300 as rumoured?.
I think the performance of the cards would need to be seen 1st.So, what do you think guys? after having a glance at the tidbits of the new architecture, could this be indeed a new R300 as rumoured?.
Indeed, but if AMD hasnt screwed up, its possible to extrapolate rough estimates. I would be mighty surprised if with such stats 6970 wont be faster than GTX580. While 6990 as someone joked would provide "useless" amount of firepowerI think the performance of the cards would need to be seen 1st.
Don't forget even with only one 32bit int mul per clock, the absolute number is still (somewhat) higher than what a GTX580 can do (which has half-rate 32bit int rate). For 32bit int adds that's more than twice as fast as GTX 580 (unless, of course, that's scalar, in which case it'll drop to 1 32bit int add as usual). So that still looks plenty fast to me.I guess I meant 'no fullspeed 32bit INT ops'.
The slide says:
4* 24bit MUL, ADD or MAD
2* 32bit ADD
1* 32bit MUL
I was hoping they would be doing fullrate 32bit rather than ganging the SPs.
If 32bit INT isn't used all that much this should be OK though.
And it can't be a new R300 (imho) anyway, since this was such a big leap in all areas - not only performance but also feature wise. Cayman is probably a nice improvement in performance (and it could be a decent improvement in perf/w too which gets more important), but it doesn't really bring anything new to the table feature wise me thinks.It's not going to be an R300 because GTX580 is already out, and its not NV30.
That's a mistake. It can do four 32bit integer adds per cycle. An add can be done in each VLIW slot, same as in Cypress (and everything since R600).4* 24bit MUL, ADD or MAD
2* 32bit ADD
1* 32bit MUL
I was hoping they would be doing fullrate 32bit rather than ganging the SPs.
If 32bit INT isn't used all that much this should be OK though.
x: ADD_INT R0.x, R1.x, R2.x
y: ADD_INT R0.y, R1.y, R2.y
z: ADD_INT R0.z, R1.z, R2.z
w: ADD_INT R0.w, R1.w, R2.w
Hmm, somehow I'd gotten into my mind that NV is doing fullspeed 32bit INTsthe absolute number is still (somewhat) higher than what a GTX580 can do (which has half-rate 32bit int rate).
That's what they said, tests have shown something else iirc.Hmm, somehow I'd gotten into my mind that NV is doing fullspeed 32bit INTs
That's a mistake. It can do four 32bit integer adds per cycle. An add can be done in each VLIW slot, same as in Cypress (and everything since R600).
Edit:
ISA looks like that (no difference between different generations):
Code:x: ADD_INT R0.x, R1.x, R2.x y: ADD_INT R0.y, R1.y, R2.y z: ADD_INT R0.z, R1.z, R2.z w: ADD_INT R0.w, R1.w, R2.w
Indeed, but if AMD hasnt screwed up, its possible to extrapolate rough estimates. I would be mighty surprised if with such stats 6970 wont be faster than GTX580. While 6990 as someone joked would provide "useless" amount of firepower
Unfortunately the people who know that also couldn't tell us if they wanted to, NDAs are tricky that way.By the way, does anyone know when the NDA actually expires? I mean the NDA for this presentation, not benchmarks.
That's a very interesting claim, 'cos people like me don't upgrade ev'ry generation. So, this "power" will be helpful for the upcoming game releases.
Oh you're right. The whitepaper said only DP can't be dual issued but two int instructions can. Either that's just not true or it could be artificially limited for consumer parts?That's what they said, tests have shown something else iirc.
By the way, does anyone know when the NDA actually expires? I mean the NDA for this presentation, not benchmarks.
How and why would they make such a mistake?That's a mistake. It can do four 32bit integer adds per cycle.
I know in compute, 32bit int is used often for indices into data structures.
How and why would they make such a mistake?
They use the Mantissa part of the FP unit only can do 24bit INT unless they have 48bit FP capability.
Don't forget even with only one 32bit int mul per clock, the absolute number is still (somewhat) higher than what a GTX580 can do (which has half-rate 32bit int rate). For 32bit int adds that's more than twice as fast as GTX 580 (unless, of course, that's scalar, in which case it'll drop to 1 32bit int add as usual). So that still looks plenty fast to me.
Ahh, but still why would ATI be quoting 24bit only?With FMA the need 48 bit adders for correct results.
Perhaps for a few but when you have 1920 of them?Anyway, 32 bits adders are way to cheap to not include...