AMD: R9xx Speculation

Well Fuad reckons Cayman is the Biggest chip ATI ever made. That seems somewhat questionable.

From fudo:
We also learned that Cayman is the biggest chip that ATI, or AMD has ever made. It looks like the size of the chip is very close to GF100 aka Fermi and the thermals should be in that range. Judging by our information Barts XT and PRO should end up pretty fast, perhaps even faster than Radeon HD 5870 and 5850 so the guys who already have 5000-series boards should find it a worthwhile upgrade.

I think this is too much :oops:
PS: Lol at Jimbo's coment in fuad page xD
 
Well Fuad reckons Cayman is the Biggest chip ATI ever made. That seems somewhat questionable.

edit - Perhaps he's just doubled the 230mm2 figure of Barts?

Or maybe this is AMD's attempt at "Fermi done right"? :D

So AMD sees Nvidia do Epic Fail with GF100, and decides to abandon their hugely successful sweet-spot strategy after one generation and follow the same route as Nvidia with a giant 300 watt single chip that bleeds heat like a nuclear furnace?

That seems really likely. :rolleyes: Looks like Fudo is getting his information on future AMD products from Nvidia PR again.
 
I think the odds of tesselation performance being better than what GF104 offers are pretty slim. The other question though is why would you want that, just so that it's faster on unigine extreme?
I think it's just as likely as AMD going half-rate DP with a 4-wide SPU (split/distributed T lane... edit: not implying it's the case here, take this as a "when" or "if" ).

Evergreen-class GPUs have a post-tessellator issue and it's likely Cypress should have handled that. After all, they were marketing RV770's sideport back then, although it has never been used.
 
I think the odds of tesselation performance being better than what GF104 offers are pretty slim. The other question though is why would you want that, just so that it's faster on unigine extreme?

This is an industry of bullet points and buzz words. Such "wins" go a long way.
 
So AMD sees Nvidia do Epic Fail with GF100, and decides to abandon their hugely successful sweet-spot strategy after one generation and follow the same route as Nvidia with a giant 300 watt single chip that bleeds heat like a nuclear furnace?

That seems really likely. :rolleyes: Looks like Fudo is getting his information on future AMD products from Nvidia PR again.

I think they just really like to hit the same stone, twice.
 
How often have you seen reasonably priced pre-order offers for GPUs?
That's something I thought after posting, but couldn't get back immediately to edit. :) Let's hope it does indeed include the "novelty tax".

It is indeed more difficult to cool a smaller die vs. a larger one if it uses the same amount of power, but I'm not sure this really makes enough difference here. You could, for instance, still use the same hsf the fan would just have to spin faster (of course, that's not ideal for noise). Or the chip could be specified to run at a bit higher temperature (though that would cause in itself higher power draw too - might already be factored in in that TDP).
Maybe that's why we heard the rumours of vapour chamber coolers being standard?

Why? If the HD6850 has same performance as GTX460 1GB it is slightly cheaper then. The HD6870 is expected to be faster than HD5850, and this shop at least only has one HD5850 which manages to beat that preorder price (barely) all the rest are more expensive (sometimes quite a bit).
I can localy get a non reference 5850 for the equvalent of 228Euro. Cheapest GTX 460 1GB from Palit for 171 Euro! (Today's Euro prices)

The other question though is why would you want that, just so that it's faster on unigine extreme?
For being future-proof. I am not a person who changes gear often, I'd like to get the most milleage from hardware as possible. For a reasonable price ofc. :)
 
xn8eiw.jpg
 
So AMD sees Nvidia do Epic Fail with GF100, and decides to abandon their hugely successful sweet-spot strategy after one generation and follow the same route as Nvidia with a giant 300 watt single chip that bleeds heat like a nuclear furnace?
In what way are they abandoning the sweet spot strategy?

NVidia introduced GF100 first and only got GF104 out 3 months ago. That's what the old strategy looks like. AMD is introducing the sweet spot chip first, and judging by its die size and performance, they're doing exactly what they did with the RV7xx except this time NVidia won't be squeaking out a marginal victory at the $400+ price point.
 
http://www.microsofttranslator.com/...pu.org/viewthread.php?tid=3405&extra=page%3D1

Tested, Barts GPU Pro running Bench, most instruction throughput in 114-115Ginst/s or so, the frequency is 725MHz.

115G/725M=158.6。 Obviously there are 160 Barts Pro ALU.

In addition POW instruction throughput is ADD/MUL, half, SIN/COS are ADD, 1/2.5. Can you guess the ordinary RV940 ALU and SFU?
i.e. 1/2 for POW and 1/2.5 for SIN.

For HD5870, GPU Shader Analyzer shows 129 ALU instructions for the POW shader, i.e. half MUL rate. Here's how 4 POW instructions compile:

Code:
00 ALU: ADDR(32) CNT(12) KCACHE0(CB0:0-15) 
      0  t: LOG_sat     ____,  KC0[0].x      
      1  z: MUL         ____,  KC0[0].x,  PS0      
         t: LOG_sat     ____,  KC0[1].x      
      2  w: MUL         R127.w,  KC0[1].x,  PS1      
         t: EXP_e       R127.y,  PV1.z      
      3  t: LOG_sat     ____,  PS2      
      4  x: MUL         ____,  R127.y,  PS3      
         t: EXP_e       R127.z,  R127.w      
      5  t: EXP_e       ____,  PV4.x      
      6  t: LOG_sat     ____,  PS5      
      7  y: MUL         ____,  R127.z,  PS6      
      8  t: EXP_e       R0.x,  PV7.y      
01 EXP_DONE: PIX0, R0.xxxx
END_OF_PROGRAM

SIN is a bit of a tangle to compile in GPUSA, but with a bit of fiddling it comes out as 162 instructions, i.e. 1/2.5.

Strange thing about the SIN shader is that it's mostly MUL, MULADD and FRACT instructions. Trying to normalise the input to the SIN instruction, it seems. So, ahem, it might have no use as a test, e.g. this is 4 SIN instructions:

Code:
00 ALU: ADDR(32) CNT(30) KCACHE0(CB0:0-15) 
      0  x: MULADD      ____,  KC0[0].x,  (0x3E22F983, 0.1591549367f).x,  0.5      
         w: MULADD      ____,  KC0[1].x,  (0x3E22F983, 0.1591549367f).x,  0.5      
      1  z: FRACT       ____,  PV0.w      
         w: FRACT       ____,  PV0.x      
      2  y: MULADD      ____,  PV1.z,  (0x40C90FDB, 6.283185482f).y, -(0x40490FDB, 3.141592741f).x      
         z: MULADD      ____,  PV1.w,  (0x40C90FDB, 6.283185482f).y, -(0x40490FDB, 3.141592741f).x      
      3  x: MUL         T0.x,  PV2.y,  (0x3E22F983, 0.1591549367f).x      
         y: MUL         ____,  PV2.z,  (0x3E22F983, 0.1591549367f).x      
      4  t: SIN         ____,  PV3.y      
      5  z: MULADD      ____,  PS4,  (0x3E22F983, 0.1591549367f).x,  0.5      
         t: SIN         ____,  T0.x      
      6  x: FRACT       ____,  PV5.z      
         y: MULADD      ____,  PS5,  (0x3E22F983, 0.1591549367f).x,  0.5      
      7  x: FRACT       ____,  PV6.y      
         y: MULADD      ____,  PV6.x,  (0x40C90FDB, 6.283185482f).y, -(0x40490FDB, 3.141592741f).x      
      8  x: MUL         ____,  PV7.y,  (0x3E22F983, 0.1591549367f).x      
         w: MULADD      ____,  PV7.x,  (0x40C90FDB, 6.283185482f).z, -(0x40490FDB, 3.141592741f).y      
      9  z: MUL         ____,  PV8.w,  (0x3E22F983, 0.1591549367f).x      
         t: SIN         T0.z,  PV8.x      
     10  t: SIN         ____,  PV9.z      
     11  x: ADD         R0.x,  T0.z,  PS10      
01 EXP_DONE: PIX0, R0.xxxx
END_OF_PROGRAM

which compiles to 12 cycles.

An 8 instruction MUL looks like this:

Code:
00 ALU: ADDR(32) CNT(32) KCACHE0(CB0:0-15) 
      0  z: MUL         R127.z,  KC0[0].y,  KC0[1].y      
         w: MUL         R127.w,  KC0[0].x,  KC0[1].x      
      1  x: MUL         R127.x,  KC0[0].w,  KC0[1].w      
         y: MUL         R127.y,  KC0[0].z,  KC0[1].z      
      2  x: MUL         R126.x,  KC0[1].w,  PV1.x      
         y: MUL         R126.y,  KC0[1].z,  PV1.y      
         z: MUL         R126.z,  KC0[1].y,  R127.z      
         w: MUL         R126.w,  KC0[1].x,  R127.w      
      3  x: MUL         R127.x,  R127.x,  PV2.x      
         y: MUL         R127.y,  R127.y,  PV2.y      
         z: MUL         R127.z,  R127.z,  PV2.z      
         w: MUL         R127.w,  R127.w,  PV2.w      
      4  x: MUL         R126.x,  R126.x,  PV3.x      
         y: MUL         R126.y,  R126.y,  PV3.y      
         z: MUL         R126.z,  R126.z,  PV3.z      
         w: MUL         R126.w,  R126.w,  PV3.w      
      5  x: MUL         R127.x,  R127.x,  PV4.x      
         y: MUL         R127.y,  R127.y,  PV4.y      
         z: MUL         R127.z,  R127.z,  PV4.z      
         w: MUL         R127.w,  R127.w,  PV4.w      
      6  x: MUL         R126.x,  R126.x,  PV5.x      
         y: MUL         R126.y,  R126.y,  PV5.y      
         z: MUL         R126.z,  R126.z,  PV5.z      
         w: MUL         R126.w,  R126.w,  PV5.w      
      7  x: MUL         ____,  R127.x,  PV6.x      
         y: MUL         ____,  R127.y,  PV6.y      
         z: MUL         ____,  R127.z,  PV6.z      
         w: MUL         ____,  R127.w,  PV6.w      
      8  x: MUL         R0.x,  R126.w,  PV7.w      
         y: MUL         R0.y,  R126.z,  PV7.z      
         z: MUL         R0.z,  R126.y,  PV7.y      
         w: MUL         R0.w,  R126.x,  PV7.x      
01 EXP_DONE: PIX0, R0
END_OF_PROGRAM
As far as I can tell the throughput for XYZT would be the same as XYZWT in all three of these tests.

But I think it rules out XYZW with emulated transcendentals.
 
As far as I can tell the throughput for XYZT would be the same as XYZWT in all three of these tests.

But I think it rules out XYZW with emulated transcendentals.
But still, 160 or 192 VLIW units (pro and XT => 800/960 SPs with xyzwt or only 640/768 with xyzt) appears to be quite on the low side to reach close to Cypress performance. They really need to have widened some bottlenecks to get that performance.
 
Back
Top