AMD: R9xx Speculation

Such a 32bit integer multiplication can be constructed from four 16 bit integer multiplies and a series of adds. I hoped 3 would be enough or it would be at least possible to get the full 64bits result with one VLIW instruction group (which should be possible if the adders are fast and wide enough).
I see , thanks for the heads up .

(A+B) * (C+D) = A*C + A*D + B*C + B*D

When the multiplication is split into N parts, it needs N^2 partial multiplications.
Noted :smile: .

Slides are out:
oh and 4D! :LOL:
Nope , the good old fashioned 5D ALUs .
looks like no-X was right ..

And about the two rasterizers claim , I think this is the same as Cypress has 2 rasterizers .

Die size is 255mm , not 230 !
 
Last edited by a moderator:
Integer to float and float to integer conversions are handled by all 4 slots. Not together, but 1 slot does one conversion. So the throughput is 4 conversions per cycle max. That's qute a bit faster than Evergreen (conversions only in t unit).
[...]
Conversions seem to crop up as annoying serial-dependencies in compute kernels, so I suppose having lots of them is going to be quite useful.
 
Die size is 255mm , not 230 !

Almost exactly the same as RV770, but on 40nm. Comparing the HD 6870 and 4870 will be interesting in that respect.

On the surface, that is as far as raw specifications are concerned, the improvement is underwhelming:

800 -> 1120SPs
850 -> 900MHz

Same bus width, similar bandwidth, power… Comparing that to the actual performance improvement would be interesting indeed.
 
The same Watts for almost the same performance?

The most power efficient Cypress against the least power efficient Barts..

Can current radeons execute different shader programs in different SIMD processors?

That's mostly an API thing I guess - as the graphics pipeline does that all the time..
(including the feeeding from one program to another, which the compute pipeline really misses)
 
174828vfbe7cpwwpbktq8t.jpg

174832wwt692fww0dywfzd.jpg

174846t9sshbbissit89c2.jpg
 
Charlie was right about the 4 symmetric alu rumour.
He was well-informed, but I think Napoleon at ChipHell was the first to bring up the possibility of this for the next family, some time last year. Reduction from 5 lanes to 4 lanes has been a recurring topic for a very long time.

Oh and they're not symmetric. Transcendentals use only 3 lanes, for a start.
 
The same Watts for almost the same performance? :mad:

But, as Alexco pointed out, at significantly higher clocks.
So if we were to normalize (same process, same clock) and take advantage of any possibilities of dropped voltages that offered, my guess is that performance/Watt would show more solid improvements. (10-20% ?)
Of course they prefer to push clocks higher instead, to put some market distance between the HD6870 and the HD6850.

At the end of the day, it seems AMD has managed to improve both efficiency/mm2 and efficiency/W. Which is quite nice from a technical point of view, and gives them some addtional marketing flexibility. Good job.
 
Almost exactly the same as RV770, but on 40nm. Comparing the HD 6870 and 4870 will be interesting in that respect.

On the surface, that is as far as raw specifications are concerned, the improvement is underwhelming:

800 -> 1120SPs
850 -> 900MHz

Same bus width, similar bandwidth, power… Comparing that to the actual performance improvement would be interesting indeed.

You forget the 32 ROP-s. It helps a lot more in games than aditional shader flops.
I for example expect the 6850 to be quite close to the 6870 in majority of games out there. Altough with direct compute MLAA and more postprocessing, bullet physics it will pull surely ahead in future
 
Somehow I consider this to be the most important slide today :


174824quv1cwhwcszvt7ov.jpg


For starters , Antilles is 2X Cayman indeed .
but Cayman is derived from EG ?
 
Somehow I consider this to be the most important slide today :


174824quv1cwhwcszvt7ov.jpg


For starters , Antilles is 2X Cayman indeed .
but Cayman is derived from EG ?

It prolly just means 5800 'price segment' was split into Barts and Cayman segments.
 
Last edited by a moderator:
OK, so :

— no slide so far has mentioned anything other than Barts, Cayman and Antilles;
— Barts is basically a tweaked and scaled-up Juniper with no big improvement;
— Juniper is said to be maintained;
— Dirk Meyer said the entire NI lineup would be released in 2010.

I'm starting to seriously think that this is it, Northern Islands = Barts + Cayman (and Antilles), and neither Juniper nor Redwood or Cedar will be replaced.
 
That is well expected , the slide wouldn't be much different if HD 5850 replaced HD 6870 .

I guess they tested those @2500x1600 (as usual) + wierd SMID-to-ROPs bottleneck that plagues all Fermi cards = not so good performance at this extreme resolution .
It prolly just means 5800 'price segment' was split into Barts and Cayman segments.
Yeah , you are probably right .
 
But it is not doing a renormalization to be between -pi/2 and +pi/2, it simply divides by 2Pi. That's something one should be able to incorporate into the lookup tables, isn't it?
This is Cypress for SIN:

Code:
      1  x: MULADD      ____,  PV0.x,  (0x3E22F983, 0.1591549367f).x,  0.5      
      2  w: FRACT       ____,  PV1.x      
      3  z: MULADD      ____,  PV2.w,  (0x40C90FDB, 6.283185482f).y,  (0xC0490FDB, -3.141592741f).x      
      4  y: MUL         ____,  PV3.z,  (0x3E22F983, 0.1591549367f).x      
      5  t: SIN         R0.x,  PV4.y

Also

http://v3.espacenet.com/publication...=A1&FT=D&date=20050331&DB=EPODOC&locale=en_gb

Figure 14 and:

[0060] As shown in FIG. 14, the input angle 1400 is first transformed into the first quadrant (0 to [pi]/2), first by mirroring it across the x axis if negative as shown in step 1410 and then across the y axis if greater than [pi]/2, as shown in step 1420. If the resulting angle is then greater than [pi]/4, the axis are exchanged, as shown in step 1430, so the final angle measures [pi]/4 or less to the x axis and the function is changed from cosine to sine or sine to cosine. The actual transformation in the float to fixed converter occurs by observing which range the input angle represents and performing the subtraction indicated for that range as outlined in FIG. 13.
I dunno if the code above is "magic" that achieves what flow chart in figure 13 does, or if they do it differently now. But as you can see it's quite a few instructions to do a generic sine/cosine., and the basic principle is to normalise the input.

The instruction sequence is different for R600:

Code:
      0  w: MULADD      ____,  R0.x,  C0.x,  0.5      
      1  z: FRACT       ____,  PV0.w      
      2  y: MULADD      ____,  PV1.z,  C0.z,  C0.w      
      3  t: SIN         R0.x,  PV2.y
So it seems to have evolved.

But this is the D3D assembly:

Code:
    ps_3_0
    def c0, 0.159154937, 0.5, 6.28318548, -3.14159274
    dcl_color v0.x
    mad r0.x, v0.x, c0.x, c0.y
    frc r0.x, r0.x
    mad r0.x, r0.x, c0.z, c0.w
    sincos r1.y, r0.x
    mov oC0, r1.y
So, erm, what we've been seeing is D3D-specific sine...

Jawed
 
Most likely, though I wouldn't gauge too much based on a drawing that was obviously copied from the 5800 one, with only a few minor edits...

More interesting is that AMD splitting Cypress and putting the 6800s back in the sweet spot means that AMD is back in the field to fight for the top with a bigger single GPU in the 6900s it appears...
 
Back
Top