NVIDIA GF100 & Friends speculation

It would be a disaster if the TDP of any product wouldn't include a safety margin. If any IHV would set the TDP for product X too tight, I wouldn't want to know how many would fry those with even conservative over-clocking exercises.

.

Yep. I just wanted to point out that u cant compare just 188W TDP for cypress and 250W TDP for gtx480 and than expect 40% more power draw and 40% more heat generated.
Those two dont have to much together directly except watt which can be used both for heat and electric power.
 
Last edited by a moderator:
MW@Home is double-precision, which is 1/4 performance for MUL and 1/2 performance for ADD. Also the programmer had to hand-code around the stupid compiler's inability to properly co-issue a pair of DP ADDs per clock. Not sure if the compiler has improved in this respect recently.

Jawed

Guess, I can help out here ;)
That is the ISA code of the shortest (and least efficient) integration kernel Milkyway@home uses when compiled from IL with the Cat8.12 (maybe some of the issues of those old driver which can be spotted and waste a cycle or two went away with newer versions). It is just a quite streamlined (and really dumb pure number crunching) implementation that basically crams as much DP MADDs into the thing as possible (besides also doing square roots, exponentials and divisions, but it doesn't use the compiler generated version, even if they are available :rolleyes:).

Code:
; --------  Disassembly --------------------
00 TEX: ADDR(336) CNT(2) VALID_PIX 
      0  SAMPLE R4, R0.xyxx, t4, s0  UNNORM(XYZW) 
      1  SAMPLE R5, R0.xyxx, t5, s0  UNNORM(XYZW) 
01 ALU: ADDR(32) CNT(4) 
      2  x: MOV         R1.x,  R0.y      
         y: MOV         R0.y,  0.0f      
         z: MOV         R1.z,  0.0f      
         t: MOV         R6.y,  R0.x      
02 TEX: ADDR(340) CNT(3) VALID_PIX 
      3  SAMPLE R0.xy__, R0.xyxx, t2, s0  UNNORM(XYZW) 
      4  SAMPLE R7, R1.xzxx, t0, s0  UNNORM(XYZW) 
      5  SAMPLE R8.xy__, R1.xzxx, t1, s0  UNNORM(XYZW) 
03 ALU: ADDR(36) CNT(10) KCACHE0(CB2:0-15) 
      6  x: MUL_64      R9.x,  R0.y,  KC0[0].y      
         y: MUL_64      R9.y,  R0.y,  KC0[0].y      
         z: MUL_64      ____,  R0.y,  KC0[0].y      
         w: MUL_64      ____,  R0.x,  KC0[0].x      
         t: MOV         R0.z,  0.0f      
      7  x: MOV         R10.x,  0.0f      
         y: MOV         R10.y,  0.0f      
         w: MOV         R0.w,  0.0f      
         t: MOV         R6.x,  0.0f      
      8  t: I_TO_F      R1.w,  KC0[4].x      
04 LOOP_DX10 i0 FAIL_JUMP_ADDR(10) 
    05 ALU_BREAK: ADDR(46) CNT(1) 
          9  x: PREDGT      ____,  R1.w,  R6.x      UPDATE_EXEC_MASK UPDATE_PRED 
    06 TEX: ADDR(346) CNT(1) VALID_PIX 
         10  SAMPLE R3, R6.xyxx, t3, s0  UNNORM(XYZW) 
    07 ALU: ADDR(47) CNT(121) KCACHE0(CB2:0-15) KCACHE1(CB0:0-15) 
         11  x: MULADD_64   T1.x,  R3.y,  R7.y,  KC0[3].y      
             y: MULADD_64   T1.y,  R3.y,  R7.y,  KC0[3].y      
             z: MULADD_64   ____,  R3.y,  R7.y,  KC0[3].y      
             w: MULADD_64   ____,  R3.x,  R7.x,  KC0[3].x      
             t: ADD         R6.x,  R6.x,  1.0f      
         12  x: MUL_64      T2.x,  R3.y,  R7.w      
             y: MUL_64      T2.y,  R3.y,  R7.w      
             z: MUL_64      ____,  R3.y,  R7.w      
             w: MUL_64      ____,  R3.x,  R7.z      
         13  x: ADD_64      R0.x,  PV12.y,  KC1[1].w      
             y: ADD_64      R0.y,  PV12.x,  KC1[1].z      
             z: ADD_64      T0.z,  T1.y,  KC1[0].w      
             w: ADD_64      T0.w,  T1.x,  KC1[0].z      
         14  x: MUL_64      T3.x,  R3.y,  R8.y      
             y: MUL_64      T3.y,  R3.y,  R8.y      
             z: MUL_64      ____,  R3.y,  R8.y      
             w: MUL_64      ____,  R3.x,  R8.x      
         15  x: MUL_64      T0.x,  KC1[0].y,  T0.w      
             y: MUL_64      T0.y,  KC1[0].y,  T0.w      
             z: MUL_64      ____,  KC1[0].y,  T0.w      
             w: MUL_64      ____,  KC1[0].x,  T0.z      
         16  z: ADD_64      T1.z,  T3.y,  KC1[2].w      
             w: ADD_64      T1.w,  T3.x,  KC1[2].z      
         17  x: MULADD_64   T0.x,  KC1[1].y,  R0.y,  T0.y      
             y: MULADD_64   T0.y,  KC1[1].y,  R0.y,  T0.y      
             z: MULADD_64   ____,  KC1[1].y,  R0.y,  T0.y      
             w: MULADD_64   ____,  KC1[1].x,  R0.x,  T0.x      
         18  x: MUL_64      T1.x,  T1.y,  T1.y      
             y: MUL_64      T1.y,  T1.y,  T1.y      
             z: MUL_64      ____,  T1.y,  T1.y      
             w: MUL_64      ____,  T1.x,  T1.x      
         19  x: MULADD_64   T0.x,  KC1[2].y,  T1.w,  T0.y      
             y: MULADD_64   T0.y,  KC1[2].y,  T1.w,  T0.y      
             z: MULADD_64   ____,  KC1[2].y,  T1.w,  T0.y      
             w: MULADD_64   ____,  KC1[2].x,  T1.z,  T0.x      
         20  x: MULADD_64   T1.x,  T2.y,  T2.y,  T1.y      
             y: MULADD_64   T1.y,  T2.y,  T2.y,  T1.y      
             z: MULADD_64   ____,  T2.y,  T2.y,  T1.y      
             w: MULADD_64   ____,  T2.x,  T2.x,  T1.x      
             t: MOV         T0.y, -PV19.y      
         21  x: MUL_64      ____,  T3.y,  T3.y      
             y: MUL_64      ____,  T3.y,  T3.y      
             z: MUL_64      ____,  T3.y,  T3.y      
             w: MUL_64      ____,  T3.x,  T3.x      
         22  x: MULADD_64   T2.x,  PV21.y,  KC0[2].y,  T1.y      
             y: MULADD_64   T2.y,  PV21.y,  KC0[2].y,  T1.y      
             z: MULADD_64   ____,  PV21.y,  KC0[2].y,  T1.y      
             w: MULADD_64   ____,  PV21.x,  KC0[2].x,  T1.x      
         23  x: MULADD_64   T1.x,  T0.y,  KC1[0].y,  T0.w      
             y: MULADD_64   T1.y,  T0.y,  KC1[0].y,  T0.w      
             z: MULADD_64   ____,  T0.y,  KC1[0].y,  T0.w      
             w: MULADD_64   ____,  T0.x,  KC1[0].x,  T0.z      
         24  x: F64_TO_F32  ____,  T2.y      
             y: F64_TO_F32  ____,  T2.x      
         25  x: MULADD_64   T3.x,  T0.y,  KC1[1].y,  R0.y      
             y: MULADD_64   T3.y,  T0.y,  KC1[1].y,  R0.y      
             z: MULADD_64   ____,  T0.y,  KC1[1].y,  R0.y      
             w: MULADD_64   ____,  T0.x,  KC1[1].x,  R0.x      
             t: RSQ_FF      T0.w,  PV24.x      
         26  x: MUL_64      ____,  T1.y,  T1.y      
             y: MUL_64      ____,  T1.y,  T1.y      
             z: MUL_64      ____,  T1.y,  T1.y      
             w: MUL_64      ____,  T1.x,  T1.x      
         27  x: MULADD_64   T3.x,  T3.y,  T3.y,  PV26.y      
             y: MULADD_64   T3.y,  T3.y,  T3.y,  PV26.y      
             z: MULADD_64   ____,  T3.y,  T3.y,  PV26.y      
             w: MULADD_64   ____,  T3.x,  T3.x,  PV26.x      
         28  x: MULADD_64   ____,  T0.y,  KC1[2].y,  T1.w      
             y: MULADD_64   ____,  T0.y,  KC1[2].y,  T1.w      
             z: MULADD_64   ____,  T0.y,  KC1[2].y,  T1.w      
             w: MULADD_64   ____,  T0.x,  KC1[2].x,  T1.z      
         29  x: MULADD_64   R0.x,  PV28.y,  PV28.y,  T3.y      
             y: MULADD_64   R0.y,  PV28.y,  PV28.y,  T3.y      
             z: MULADD_64   ____,  PV28.y,  PV28.y,  T3.y      
             w: MULADD_64   ____,  PV28.x,  PV28.x,  T3.x      
         30  x: F32_TO_F64  T3.x, -T0.w      
             y: F32_TO_F64  T3.y,  0.0f      
         31  x: MUL_64      ____,  PV30.y,  PV30.y      
             y: MUL_64      ____,  PV30.y,  PV30.y      
             z: MUL_64      ____,  PV30.y,  PV30.y      
             w: MUL_64      ____,  PV30.x,  PV30.x      
         32  x: MULADD_64   ____,  T2.y,  PV31.y,  (0xC0080000, -2.125f).x      
             y: MULADD_64   ____,  T2.y,  PV31.y,  (0xC0080000, -2.125f).x      
             z: MULADD_64   ____,  T2.y,  PV31.y,  (0xC0080000, -2.125f).x      
             w: MULADD_64   ____,  T2.x,  PV31.x,  0.0f      
         33  x: MUL_64      T3.x,  T3.y,  PV32.y      
             y: MUL_64      T3.y,  T3.y,  PV32.y      
             z: MUL_64      ____,  T3.y,  PV32.y      
             w: MUL_64      ____,  T3.x,  PV32.x      
         34  x: MUL_64      T2.x,  T2.y,  PV33.y      
             y: MUL_64      T2.y,  T2.y,  PV33.y      
             z: MUL_64      ____,  T2.y,  PV33.y      
             w: MUL_64      ____,  T2.x,  PV33.x      
         35  x: MUL_64      ____,  T3.y,  PV34.y      
             y: MUL_64      ____,  T3.y,  PV34.y      
             z: MUL_64      ____,  T3.y,  PV34.y      
             w: MUL_64      ____,  T3.x,  PV34.x      
         36  x: MULADD_64   ____,  PV35.y,  (0xBFB00000, -1.375f).y,  (0x3FE80000, 1.8125f).x      
             y: MULADD_64   ____,  PV35.y,  (0xBFB00000, -1.375f).y,  (0x3FE80000, 1.8125f).x      
             z: MULADD_64   ____,  PV35.y,  (0xBFB00000, -1.375f).y,  (0x3FE80000, 1.8125f).x      
             w: MULADD_64   ____,  PV35.x,  0.0f,  0.0f      
         37  x: MUL_64      T2.x,  PV36.y,  T2.y      
             y: MUL_64      T2.y,  PV36.y,  T2.y      
             z: MUL_64      ____,  PV36.y,  T2.y      
             w: MUL_64      ____,  PV36.x,  T2.x      
         38  x: ADD_64      T3.x,  PV37.y,  KC0[1].y      
             y: ADD_64      T3.y,  PV37.x,  KC0[1].x      
         39  x: MUL_64      ____,  T2.y,  PV38.y      
             y: MUL_64      ____,  T2.y,  PV38.y      
             z: MUL_64      ____,  T2.y,  PV38.y      
             w: MUL_64      ____,  T2.x,  PV38.x      
         40  x: MUL_64      ____,  PV39.y,  T3.y      
             y: MUL_64      ____,  PV39.y,  T3.y      
             z: MUL_64      ____,  PV39.y,  T3.y      
             w: MUL_64      ____,  PV39.x,  T3.x      
         41  x: MUL_64      R2.x,  PV40.y,  T3.y      
             y: MUL_64      R2.y,  PV40.y,  T3.y      
             z: MUL_64      ____,  PV40.y,  T3.y      
             w: MUL_64      ____,  PV40.x,  T3.x      
    08 ALU: ADDR(168) CNT(127) KCACHE0(CB1:0-15) 
         42  x: MUL_64      ____,  R0.y,  KC0[0].y      
             y: MUL_64      ____,  R0.y,  KC0[0].y      
             z: MUL_64      ____,  R0.y,  KC0[0].y      
             w: MUL_64      ____,  R0.x,  KC0[0].x      
         43  x: MUL_64      R0.x,  PV42.y,  (0x3FF71547, 1.930336833f).y      
             y: MUL_64      R0.y,  PV42.y,  (0x3FF71547, 1.930336833f).y      
             z: MUL_64      ____,  PV42.y,  (0x3FF71547, 1.930336833f).y      
             w: MUL_64      ____,  PV42.x,  (0x652B82FE, 5.062131550e22f).x      
         44  x: F64_TO_F32  T3.x,  R2.y      
             y: F64_TO_F32  ____,  R2.x      
             z: FRACT_64    T2.z,  PV43.y      
             w: FRACT_64    T2.w,  PV43.x      
             t: MOV         R2.y, -R2.y      
         45  x: ADD_64      T0.x,  PV44.w,  (0xBFE00000, -1.75f).x      
             y: ADD_64      T0.y,  PV44.z,  0.0f      
             t: MOV         R0.y, -R0.y      
         46  x: MUL_64      T1.x,  PV45.y,  PV45.y      
             y: MUL_64      T1.y,  PV45.y,  PV45.y      
             z: MUL_64      ____,  PV45.y,  PV45.y      
             w: MUL_64      ____,  PV45.x,  PV45.x      
             t: RCP_FF      T1.z,  T3.x      
         47  x: MULADD_64   T3.x,  PV46.y,  (0x3EF52B5C, 0.4788464308f).w,  (0x3F84AA4E, 1.036447287f).y      
             y: MULADD_64   T3.y,  PV46.y,  (0x3EF52B5C, 0.4788464308f).w,  (0x3F84AA4E, 1.036447287f).y      
             z: MULADD_64   ____,  PV46.y,  (0x3EF52B5C, 0.4788464308f).w,  (0x3F84AA4E, 1.036447287f).y      
             w: MULADD_64   ____,  PV46.x,  (0x4D1F00B9, 166726544.0f).z,  (0xB649A98F, -0.000003005003009f).x      
         48  x: MULADD_64   T2.x,  T1.y,  (0x3E8657CD, 0.2623886168f).w,  (0x3F33185F, 0.6995906234f).y      
             y: MULADD_64   T2.y,  T1.y,  (0x3E8657CD, 0.2623886168f).w,  (0x3F33185F, 0.6995906234f).y      
             z: MULADD_64   ____,  T1.y,  (0x3E8657CD, 0.2623886168f).w,  (0x3F33185F, 0.6995906234f).y      
             w: MULADD_64   ____,  T1.x,  (0xD06316DC, -1.523970458e10f).z,  (0x478FF1EB, 73699.83594f).x      
         49  x: MULADD_64   T3.x,  T1.y,  T3.y,  (0x3FE62E42, 1.798286676f).y      
             y: MULADD_64   T3.y,  T1.y,  T3.y,  (0x3FE62E42, 1.798286676f).y      
             z: MULADD_64   ____,  T1.y,  T3.y,  (0x3FE62E42, 1.798286676f).y      
             w: MULADD_64   ____,  T1.x,  T3.x,  (0xFEFA39EF, -1.663039037e38f).x      
         50  x: MULADD_64   T2.x,  T1.y,  T2.y,  (0x3FABF3E7, 1.343380809f).y      
             y: MULADD_64   T2.y,  T1.y,  T2.y,  (0x3FABF3E7, 1.343380809f).y      
             z: MULADD_64   ____,  T1.y,  T2.y,  (0x3FABF3E7, 1.343380809f).y      
             w: MULADD_64   ____,  T1.x,  T2.x,  (0x389CEFF9, 0.00007483358058f).x      
         51  x: MUL_64      T0.x,  T0.y,  T3.y      
             y: MUL_64      T0.y,  T0.y,  T3.y      
             z: MUL_64      ____,  T0.y,  T3.y      
             w: MUL_64      ____,  T0.x,  T3.x      
         52  x: MULADD_64   ____,  T1.y,  T2.y,  (0x3FF00000, 1.875f).x      
             y: MULADD_64   ____,  T1.y,  T2.y,  (0x3FF00000, 1.875f).x      
             z: MULADD_64   ____,  T1.y,  T2.y,  (0x3FF00000, 1.875f).x      
             w: MULADD_64   ____,  T1.x,  T2.x,  0.0f      
         53  x: MULADD_64   T3.x,  T0.y,  (0xBFE00000, -1.75f).x,  PV52.y      
             y: MULADD_64   T3.y,  T0.y,  (0xBFE00000, -1.75f).x,  PV52.y      
             z: MULADD_64   ____,  T0.y,  (0xBFE00000, -1.75f).x,  PV52.y      
             w: MULADD_64   ____,  T0.x,  0.0f,  PV52.x      
         54  x: F64_TO_F32  ____,  PV53.y      
             y: F64_TO_F32  ____,  PV53.x      
             w: MOV         T3.w, -PV53.y      
         55  z: F32_TO_F64  T0.z,  T1.z      
             w: F32_TO_F64  T0.w,  0.0f      
             t: RCP_FF      ____,  PV54.x      
         56  z: F32_TO_F64  T1.z,  PS55      
             w: F32_TO_F64  T1.w,  0.0f      
         57  x: MULADD_64   ____,  T3.w,  PV56.w,  (0x3FF00000, 1.875f).x      
             y: MULADD_64   ____,  T3.w,  PV56.w,  (0x3FF00000, 1.875f).x      
             z: MULADD_64   ____,  T3.w,  PV56.w,  (0x3FF00000, 1.875f).x      
             w: MULADD_64   ____,  T3.x,  PV56.z,  0.0f      
         58  x: MULADD_64   T2.x,  T1.w,  PV57.y,  T1.w      
             y: MULADD_64   T2.y,  T1.w,  PV57.y,  T1.w      
             z: MULADD_64   ____,  T1.w,  PV57.y,  T1.w      
             w: MULADD_64   ____,  T1.z,  PV57.x,  T1.z      
         59  x: MULADD_64   T1.x,  R2.y,  T0.w,  (0x3FF00000, 1.875f).x      
             y: MULADD_64   T1.y,  R2.y,  T0.w,  (0x3FF00000, 1.875f).x      
             z: MULADD_64   ____,  R2.y,  T0.w,  (0x3FF00000, 1.875f).x      
             w: MULADD_64   ____,  R2.x,  T0.z,  0.0f      
         60  x: MUL_64      R1.x,  T0.y,  T2.y      
             y: MUL_64      R1.y,  T0.y,  T2.y      
             z: MUL_64      ____,  T0.y,  T2.y      
             w: MUL_64      ____,  T0.x,  T2.x      
         61  x: MULADD_64   T1.x,  T0.w,  T1.y,  T0.w      
             y: MULADD_64   T1.y,  T0.w,  T1.y,  T0.w      
             z: MULADD_64   ____,  T0.w,  T1.y,  T0.w      
             w: MULADD_64   ____,  T0.z,  T1.x,  T0.z      
         62  z: ADD_64      T2.z,  T2.w,  R0.y      
             w: ADD_64      T2.w,  T2.z,  R0.x      
         63  x: MULADD_64   ____,  T3.w,  R1.y,  T0.y      
             y: MULADD_64   ____,  T3.w,  R1.y,  T0.y      
             z: MULADD_64   ____,  T3.w,  R1.y,  T0.y      
             w: MULADD_64   ____,  T3.x,  R1.x,  T0.x      
         64  x: MULADD_64   T2.x,  PV63.y,  T2.y,  R1.y      
             y: MULADD_64   T2.y,  PV63.y,  T2.y,  R1.y      
             z: MULADD_64   ____,  PV63.y,  T2.y,  R1.y      
             w: MULADD_64   ____,  PV63.x,  T2.x,  R1.x      
         65  x: F64_TO_F32  ____,  T2.w      
             y: F64_TO_F32  ____,  T2.z      
         66  x: MUL_64      T3.x,  R3.w,  T1.y      
             y: MUL_64      T3.y,  R3.w,  T1.y      
             z: MUL_64      ____,  R3.w,  T1.y      
             w: MUL_64      ____,  R3.z,  T1.x      
             t: F_TO_I      T2.w, -PV65.x      
         67  x: MULADD_64   T0.x,  R2.y,  PV66.y,  R3.w      
             y: MULADD_64   T0.y,  R2.y,  PV66.y,  R3.w      
             z: MULADD_64   ____,  R2.y,  PV66.y,  R3.w      
             w: MULADD_64   ____,  R2.x,  PV66.x,  R3.z      
         68  x: MULADD_64   T2.x,  T2.y,  (0x3FF6A09E, 1.926776648f).y,  (0x3FF6A09E, 1.926776648f).y      
             y: MULADD_64   T2.y,  T2.y,  (0x3FF6A09E, 1.926776648f).y,  (0x3FF6A09E, 1.926776648f).y      
             z: MULADD_64   ____,  T2.y,  (0x3FF6A09E, 1.926776648f).y,  (0x3FF6A09E, 1.926776648f).y      
             w: MULADD_64   ____,  T2.x,  (0x667F3BCD, 3.013266457e23f).x,  (0x667F3BCD, 3.013266457e23f).x      
         69  x: MULADD_64   ____,  T0.y,  T1.y,  T3.y      
             y: MULADD_64   ____,  T0.y,  T1.y,  T3.y      
             z: MULADD_64   ____,  T0.y,  T1.y,  T3.y      
             w: MULADD_64   ____,  T0.x,  T1.x,  T3.x      
         70  x: LDEXP_64    R2.x,  T2.y,  T2.w      
             y: LDEXP_64    R2.y,  T2.x,  T2.w      
             z: ADD_64      R0.z,  R0.w,  PV69.y      
             w: ADD_64      R0.w,  R0.z,  PV69.x      
         71  x: MULADD_64   R10.x,  R2.y,  R3.w,  R10.y      
             y: MULADD_64   R10.y,  R2.y,  R3.w,  R10.y      
             z: MULADD_64   ____,  R2.y,  R3.w,  R10.y      
             w: MULADD_64   ____,  R2.x,  R3.z,  R10.x      
09 ENDLOOP i0 PASS_JUMP_ADDR(5) 
10 ALU: ADDR(295) CNT(36) 
     72  x: MUL_64      ____,  R0.w,  R9.y      
         y: MUL_64      ____,  R0.w,  R9.y      
         z: MUL_64      T0.z,  R0.w,  R9.y      
         w: MUL_64      T0.w,  R0.z,  R9.x      
     73  x: MUL_64      T0.x,  R10.y,  R9.y      
         y: MUL_64      T0.y,  R10.y,  R9.y      
         z: MUL_64      ____,  R10.y,  R9.y      
         w: MUL_64      ____,  R10.x,  R9.x      
     74  x: ADD_64      R1.x,  R4.y,  T0.w      
         y: ADD_64      R1.y,  R4.x,  T0.z      
         z: ADD_64      R0.z,  R5.y,  PV73.y      VEC_120 
         w: ADD_64      R0.w,  R5.x,  PV73.x      VEC_120 
     75  x: MOV         ____,  PV74.z      
         y: MOV         ____, -PV74.w      
         z: MOV         ____,  PV74.x      
         t: MOV         ____, -PV74.y      
     76  x: ADD_64      ____,  PV75.y,  R5.y      
         y: ADD_64      ____,  PV75.x,  R5.x      
         z: ADD_64      ____,  PS75,  R4.y      VEC_021 
         w: ADD_64      ____,  PV75.z,  R4.x      VEC_021 
     77  x: ADD_64      ____,  PV76.y,  T0.y      
         y: ADD_64      ____,  PV76.x,  T0.x      
         z: ADD_64      ____,  PV76.w,  T0.w      
         w: ADD_64      ____,  PV76.z,  T0.z      
     78  x: ADD_64      R0.x,  R5.w,  PV77.y      
         y: ADD_64      R0.y,  R5.z,  PV77.x      
         z: ADD_64      R1.z,  R4.w,  PV77.w      VEC_120 
         w: ADD_64      R1.w,  R4.z,  PV77.z      VEC_120 
     79  x: MOV         R3.x,  R0.z      
         y: MOV         R3.y,  R0.w      
         z: MOV         R3.z,  PV78.x      
         w: MOV         R3.w,  PV78.y      
     80  x: MOV         R2.x,  R1.x      
         y: MOV         R2.y,  R1.y      
         z: MOV         R2.z,  R1.z      
         w: MOV         R2.w,  R1.w      
11 EXP_DONE: PIX0, R2  BRSTCNT(1) 
END_OF_PROGRAM

Thanks for the answer guys ..

For any shader core to perform bit shifts/AND/OR , does it need native hardware support ? or it could be doable in software with no penality ?

I know Cypress ALUs have the bit shift/AND/OR capability , but GT200 ALUs don't have it , will they need software emulation to do it ?
 
power-load.gif



GTX 285 has 204W TDP. Load power is 330W.
GTX 480, taking 250W, has 45W more. 375W or so?

I'll give +-10W due to measurement errors, 385W.

Techreport's HD5870 is 290W.

If nApoleon is using a lower-volted HD5870, or a test that doesn't stress Cypress that much, system load could end up more like the 5850... 255W.

130W could be a bit of a stretch, but 90-100W isn't, considering how thrifty the RV800 archi has been in real life workloads compared to its board power/TDP.

Some guys did some mearurements recently, directly on the card, and the Radeon 5K cards seem to be on par with the TDP AMD announced! And that's on Furmark!
 
For any shader core to perform bit shifts/AND/OR , does it need native hardware support ? or it could be doable in software with no penality ?

I know Cypress ALUs have the bit shift/AND/OR capability , but GT200 ALUs don't have it , will they need software emulation to do it ?

I don't think it's efficient to simulate shifts or bitwise operations (with the exception where shift left can be simulated with integer addition). G8X/G9X/GT200 all are able to do 32 bits shift and bitwise operations in the ALU (8 per cycle for each MP).
 
Why should it not scale up?
<2 million triangles per frame in Heaven with such performance seems to indicate something's really unhappy.

If the game is writing tessellation factors to memory for a second pass, then technically this would scale (at least with bandwidth).

If performance is being hindered by other things, such as rasterisation efficiency on small triangles, inadequate post-transform cache capacity, or a flood of hardware threads for the ALUs to deal with, then these things won't scale without real changes.

It's all guesswork, anyhow. But GTX480 is clearly dramatically faster.

Jawed
 
Yep. I just wanted to point out that u cant compare just 188W TDP for cypress and 250W DTP for gtx480 and than expect 40% more power draw and 40% more heat generated.
Those two dont have to much together directly except watt which can be used both for heat and electric power.

Point taken and acknowledged.
 
Right.

there's an explanation to all the wild power usage / SP numbers.

It seems the 512CC will be reserved for the B1 part, which is currently slated for Q3.

295W is the power consumption of the part with 512CC and 725+Mhz Core (not going to see these for a while, I guess not everyone liked them.)
275W is the power consumption of the A3 part, with 480CC and 725+Mhz Core and 1050Mem i.e. OC GTX480 models
250W is the power consumption of the A3 part, with 480CC and 700Mhz Core and ~950Mem, i.e. GTX480.

So well.. everyone was right.

Nope for the 295W part of the story.
 
<2 million triangles per frame in Heaven with such performance seems to indicate something's really unhappy.

If the game is writing tessellation factors to memory for a second pass, then technically this would scale (at least with bandwidth).

If performance is being hindered by other things, such as rasterisation efficiency on small triangles, inadequate post-transform cache capacity, or a flood of hardware threads for the ALUs to deal with, then these things won't scale without real changes.

It's all guesswork, anyhow. But GTX480 is clearly dramatically faster.

Jawed

Maybe GTX480 has not enough processing power for Tessellation and pixelshader calculation at the same time. Unigine use a lot more pixelshader than nVidia's water demo. And in the water demo they lost only 50% of the frames after they increased the triangle count by 1000x.
 
Nobody is saying the architecture failed. Just that there is some process on the GPU that isn't much faster than it was on prior architectures.
I can think of: input assembly, PTVC, setup, high level command processing and, theoretically, hardware thread generation (not sure about this: linked to the hardware only being able to support 512 hardware threads across the entire chip, I believe) as elements that haven't changed in throughput per clock or capacity.

This in effect represents the unchanged sequential part of the workload when comparing the two cards. So who's fault is it that particular process isn't any faster between hardware generations? The IHV or the ISV?
How do you know that the unchanged part of the GPU is leading to the supposed vast loss in scaling? Or, if you prefer, what makes you think that these unchanged parts are a significant bottleneck on these games.

These parts of the architecture are definitely a scaling limitation, but I've not seen it quantified anywhere.

In the hypothetical 35% scaling case you first have to show the game can scale better. Armed with that you can then poke around in the bits of the architecture that haven't scaled.

Jawed
 
Those Crysis numbers are quite scary, immature Nvidia drivers? Architecture limitations as far as Crysis is concerned?
 
99mfyq.jpg

GTX480 Vantage X=9241
what about this ?

http://i44.tinypic.com/99mfyq.jpg

why change your mind ?;)

I think those leaks are not true , simply because it is hard for the GTX 470 to be only 20% faster than GTX285 !!

Edit :
I don't think it's efficient to simulate shifts or bitwise operations (with the exception where shift left can be simulated with integer addition). G8X/G9X/GT200 all are able to do 32 bits shift and bitwise operations in the ALU (8 per cycle for each MP).
Thanks for the answer ..
 
Last edited by a moderator:
Those Crysis numbers are quite scary? Architecture limitations as far as Crysis is concerned?

Why? GF100 is not really a GTX285x2. 50-60% is the number you should expect from GF100 over GTX285.
But i'm more interessted in the DX11 Games and tired of Crysis. Where are the Metro2033 leaks?
 
On GF104, if I wanted to feed my "GF100 disabled TMU rumor" I'd say that GF104 also has 64TMUs. Power is again an issue.
Since apparently I wasn't too unlucky with my previous info (480 SP for all intents and purposes), I figure I need to add some more precision to be absolutely certain that I turn out wrong about GF10x in general.

GF102: 320xSP/60xTMU/2xRasterizer/256-bit GDDR5
GF104: 160xSP/30xTMU/1xRasterizer/128-bit GDDR5
GF106: 64xSP/16xTMU/1xRasterizer/128-bit GDDR3
GF108: 32xSP/8xTMU/?xRasterizer/64-bit DDR3
(i.e. they could manipulate the ALU-TEX ratio by changing the number of half-quad TMUs associated to every 32-wide SM, the latter being perfectly copy-pasted from chip to chip)

If that doesn't turn out wrong, I don't know what will!
 
Back
Top