NVIDIA GF100 & Friends speculation

Man I can't wait for the 26th so these 480 rumors will finially die. 512SP = GTX480, not 480. Who ever started that should be hunt down, oh I think it was Charlie.


it wasn't charlie, it was based on people who had access to or saw engineering samples, so can't blame it on him :LOL:
 
In my opinion, the 64 FMA/clk could be true in a certain scenario: Nvidias wants to replace the full Fermi for gaming markets as soon as possible with a chip that uses a similar approach as AMD to achieve double precision. You cannot market a degradation (256 FMA vs. 128/64 FMA) as a successor.

WRT to markets, I'd say DP is mainly for compute clusters, not so much for workstations or even movie rendering farms (i guess!). Thus, they could very well scale back DP performance relative to SP in their lower end parts without making too many people angry.

I'm looking very much forward to see the final final specifications of Fermi.
 
In my opinion, the 64 FMA/clk could be true in a certain scenario: Nvidias wants to replace the full Fermi for gaming markets as soon as possible with a chip that uses a similar approach as AMD to achieve double precision.

Won't they just have the same problem on the next Tesla/Quadro refresh? The top end Geforces will probably always share the same chip with the flagship compute and professional products.
 
Yeah, um, there's a reason why we use floating point numbers for physics simulations. Doing this would merely reduce the precision further.
.

If the 1001.003 would represent meters than 1000 would be 1Km and 0.001 would be 1mm. If acceleration than 0.001 m/s would be extremly slow and 1000 m/s would be faster than average rifle bulet.
I dont think that something over 10 decimals would have to much meaning.
 
Have fun anyway...

Even if it hasn't been entirely decided yet NVIDIA is expected to go through with the launch event of the GeForce GTX 400 series March 26 at PAX 2010, but there won't be any benchmarks. Specifications and prices will be revealed, but the media embargo with complete benchmarks and such won't be until March 29th, Monday the week following PAX.

http://www.nordichardware.com/en/component/content/article/71-graphics/10910-nvidia-moving-media-embargo-to-march-29th.html
 
I dont think that something over 10 decimals would have to much meaning.
What I believe Chal is saying is that it's not the number of decimals per se that is the issue with SPFP, but rather when you do operations on SPFP numbers you end up with a much lower precision end result, which can be problematic.

You seem determined to simply not want to understand this issue. That's not constructive.
 
What I believe Chal is saying is that it's not the number of decimals per se that is the issue with SPFP, but rather when you do operations on SPFP numbers you end up with a much lower precision end result, which can be problematic.

You seem determined to simply not want to understand this issue. That's not constructive.

The main issue is addition of FP (worst case c = a - b where a and b are close) and this is not really fixed with higher precission.
 
Zotac GTX 480 and GTX 470 Boxes

001426878.jpg


001426879.jpg


001426880.jpg
001426881.jpg


001426882.jpg


http://www.pcpop.com/doc/0/510/510912.shtml
Translated
http://translate.google.com/transla...cpop.com/doc/0/510/510912.shtml&sl=auto&tl=en
 
GPGPU is an extension of gaming? ORLY? :|


yes it is, where do you think the future of gaming is going: raytracing, physics, the flexibility of GPGPU's is also needed in future gaming environments. Instead of API's driving capabilities of chip design its going to become direct access to a low level without the API overheads like we see with Dx and Ogl, just like in CPU's. Its great when people think about things in the near term, but the push for GPGPU in games is very high for this specific reason. We are already getting close to the limits of texture based gaming engines for visual aesthetics for an art perspective. If you guys have access to some of the models that are in games today, and put them in an offline renderer and render them, they truelly look like the stuff we see in some of the biggest budget movies.
 
The main issue is addition of FP (worst case c = a - b where a and b are close) and this is not really fixed with higher precission.
No, it isn't fixed. It's just made dramatically less. Single precision has ~7 digits of accuracy, while double precision has ~16 digits of accuracy. That means that double precision will still remain accurate out to approximately one part in a billion in situations where single precision has lost all of its precision.
 
Wow , that is such a clever use of the VLIW architecture !
Is this the case in Milyway@Home too ? I know it has amazing performance on ATi hardware too ..
Guess, I can help out here ;)
That is the ISA code of the shortest (and least efficient) integration kernel Milkyway@home uses when compiled from IL with the Cat8.12 (maybe some of the issues of those old driver which can be spotted and waste a cycle or two went away with newer versions). It is just a quite streamlined (and really dumb pure number crunching) implementation that basically crams as much DP MADDs into the thing as possible (besides also doing square roots, exponentials and divisions, but it doesn't use the compiler generated version, even if they are available :rolleyes:).

Code:
; --------  Disassembly --------------------
00 TEX: ADDR(336) CNT(2) VALID_PIX 
      0  SAMPLE R4, R0.xyxx, t4, s0  UNNORM(XYZW) 
      1  SAMPLE R5, R0.xyxx, t5, s0  UNNORM(XYZW) 
01 ALU: ADDR(32) CNT(4) 
      2  x: MOV         R1.x,  R0.y      
         y: MOV         R0.y,  0.0f      
         z: MOV         R1.z,  0.0f      
         t: MOV         R6.y,  R0.x      
02 TEX: ADDR(340) CNT(3) VALID_PIX 
      3  SAMPLE R0.xy__, R0.xyxx, t2, s0  UNNORM(XYZW) 
      4  SAMPLE R7, R1.xzxx, t0, s0  UNNORM(XYZW) 
      5  SAMPLE R8.xy__, R1.xzxx, t1, s0  UNNORM(XYZW) 
03 ALU: ADDR(36) CNT(10) KCACHE0(CB2:0-15) 
      6  x: MUL_64      R9.x,  R0.y,  KC0[0].y      
         y: MUL_64      R9.y,  R0.y,  KC0[0].y      
         z: MUL_64      ____,  R0.y,  KC0[0].y      
         w: MUL_64      ____,  R0.x,  KC0[0].x      
         t: MOV         R0.z,  0.0f      
      7  x: MOV         R10.x,  0.0f      
         y: MOV         R10.y,  0.0f      
         w: MOV         R0.w,  0.0f      
         t: MOV         R6.x,  0.0f      
      8  t: I_TO_F      R1.w,  KC0[4].x      
04 LOOP_DX10 i0 FAIL_JUMP_ADDR(10) 
    05 ALU_BREAK: ADDR(46) CNT(1) 
          9  x: PREDGT      ____,  R1.w,  R6.x      UPDATE_EXEC_MASK UPDATE_PRED 
    06 TEX: ADDR(346) CNT(1) VALID_PIX 
         10  SAMPLE R3, R6.xyxx, t3, s0  UNNORM(XYZW) 
    07 ALU: ADDR(47) CNT(121) KCACHE0(CB2:0-15) KCACHE1(CB0:0-15) 
         11  x: MULADD_64   T1.x,  R3.y,  R7.y,  KC0[3].y      
             y: MULADD_64   T1.y,  R3.y,  R7.y,  KC0[3].y      
             z: MULADD_64   ____,  R3.y,  R7.y,  KC0[3].y      
             w: MULADD_64   ____,  R3.x,  R7.x,  KC0[3].x      
             t: ADD         R6.x,  R6.x,  1.0f      
         12  x: MUL_64      T2.x,  R3.y,  R7.w      
             y: MUL_64      T2.y,  R3.y,  R7.w      
             z: MUL_64      ____,  R3.y,  R7.w      
             w: MUL_64      ____,  R3.x,  R7.z      
         13  x: ADD_64      R0.x,  PV12.y,  KC1[1].w      
             y: ADD_64      R0.y,  PV12.x,  KC1[1].z      
             z: ADD_64      T0.z,  T1.y,  KC1[0].w      
             w: ADD_64      T0.w,  T1.x,  KC1[0].z      
         14  x: MUL_64      T3.x,  R3.y,  R8.y      
             y: MUL_64      T3.y,  R3.y,  R8.y      
             z: MUL_64      ____,  R3.y,  R8.y      
             w: MUL_64      ____,  R3.x,  R8.x      
         15  x: MUL_64      T0.x,  KC1[0].y,  T0.w      
             y: MUL_64      T0.y,  KC1[0].y,  T0.w      
             z: MUL_64      ____,  KC1[0].y,  T0.w      
             w: MUL_64      ____,  KC1[0].x,  T0.z      
         16  z: ADD_64      T1.z,  T3.y,  KC1[2].w      
             w: ADD_64      T1.w,  T3.x,  KC1[2].z      
         17  x: MULADD_64   T0.x,  KC1[1].y,  R0.y,  T0.y      
             y: MULADD_64   T0.y,  KC1[1].y,  R0.y,  T0.y      
             z: MULADD_64   ____,  KC1[1].y,  R0.y,  T0.y      
             w: MULADD_64   ____,  KC1[1].x,  R0.x,  T0.x      
         18  x: MUL_64      T1.x,  T1.y,  T1.y      
             y: MUL_64      T1.y,  T1.y,  T1.y      
             z: MUL_64      ____,  T1.y,  T1.y      
             w: MUL_64      ____,  T1.x,  T1.x      
         19  x: MULADD_64   T0.x,  KC1[2].y,  T1.w,  T0.y      
             y: MULADD_64   T0.y,  KC1[2].y,  T1.w,  T0.y      
             z: MULADD_64   ____,  KC1[2].y,  T1.w,  T0.y      
             w: MULADD_64   ____,  KC1[2].x,  T1.z,  T0.x      
         20  x: MULADD_64   T1.x,  T2.y,  T2.y,  T1.y      
             y: MULADD_64   T1.y,  T2.y,  T2.y,  T1.y      
             z: MULADD_64   ____,  T2.y,  T2.y,  T1.y      
             w: MULADD_64   ____,  T2.x,  T2.x,  T1.x      
             t: MOV         T0.y, -PV19.y      
         21  x: MUL_64      ____,  T3.y,  T3.y      
             y: MUL_64      ____,  T3.y,  T3.y      
             z: MUL_64      ____,  T3.y,  T3.y      
             w: MUL_64      ____,  T3.x,  T3.x      
         22  x: MULADD_64   T2.x,  PV21.y,  KC0[2].y,  T1.y      
             y: MULADD_64   T2.y,  PV21.y,  KC0[2].y,  T1.y      
             z: MULADD_64   ____,  PV21.y,  KC0[2].y,  T1.y      
             w: MULADD_64   ____,  PV21.x,  KC0[2].x,  T1.x      
         23  x: MULADD_64   T1.x,  T0.y,  KC1[0].y,  T0.w      
             y: MULADD_64   T1.y,  T0.y,  KC1[0].y,  T0.w      
             z: MULADD_64   ____,  T0.y,  KC1[0].y,  T0.w      
             w: MULADD_64   ____,  T0.x,  KC1[0].x,  T0.z      
         24  x: F64_TO_F32  ____,  T2.y      
             y: F64_TO_F32  ____,  T2.x      
         25  x: MULADD_64   T3.x,  T0.y,  KC1[1].y,  R0.y      
             y: MULADD_64   T3.y,  T0.y,  KC1[1].y,  R0.y      
             z: MULADD_64   ____,  T0.y,  KC1[1].y,  R0.y      
             w: MULADD_64   ____,  T0.x,  KC1[1].x,  R0.x      
             t: RSQ_FF      T0.w,  PV24.x      
         26  x: MUL_64      ____,  T1.y,  T1.y      
             y: MUL_64      ____,  T1.y,  T1.y      
             z: MUL_64      ____,  T1.y,  T1.y      
             w: MUL_64      ____,  T1.x,  T1.x      
         27  x: MULADD_64   T3.x,  T3.y,  T3.y,  PV26.y      
             y: MULADD_64   T3.y,  T3.y,  T3.y,  PV26.y      
             z: MULADD_64   ____,  T3.y,  T3.y,  PV26.y      
             w: MULADD_64   ____,  T3.x,  T3.x,  PV26.x      
         28  x: MULADD_64   ____,  T0.y,  KC1[2].y,  T1.w      
             y: MULADD_64   ____,  T0.y,  KC1[2].y,  T1.w      
             z: MULADD_64   ____,  T0.y,  KC1[2].y,  T1.w      
             w: MULADD_64   ____,  T0.x,  KC1[2].x,  T1.z      
         29  x: MULADD_64   R0.x,  PV28.y,  PV28.y,  T3.y      
             y: MULADD_64   R0.y,  PV28.y,  PV28.y,  T3.y      
             z: MULADD_64   ____,  PV28.y,  PV28.y,  T3.y      
             w: MULADD_64   ____,  PV28.x,  PV28.x,  T3.x      
         30  x: F32_TO_F64  T3.x, -T0.w      
             y: F32_TO_F64  T3.y,  0.0f      
         31  x: MUL_64      ____,  PV30.y,  PV30.y      
             y: MUL_64      ____,  PV30.y,  PV30.y      
             z: MUL_64      ____,  PV30.y,  PV30.y      
             w: MUL_64      ____,  PV30.x,  PV30.x      
         32  x: MULADD_64   ____,  T2.y,  PV31.y,  (0xC0080000, -2.125f).x      
             y: MULADD_64   ____,  T2.y,  PV31.y,  (0xC0080000, -2.125f).x      
             z: MULADD_64   ____,  T2.y,  PV31.y,  (0xC0080000, -2.125f).x      
             w: MULADD_64   ____,  T2.x,  PV31.x,  0.0f      
         33  x: MUL_64      T3.x,  T3.y,  PV32.y      
             y: MUL_64      T3.y,  T3.y,  PV32.y      
             z: MUL_64      ____,  T3.y,  PV32.y      
             w: MUL_64      ____,  T3.x,  PV32.x      
         34  x: MUL_64      T2.x,  T2.y,  PV33.y      
             y: MUL_64      T2.y,  T2.y,  PV33.y      
             z: MUL_64      ____,  T2.y,  PV33.y      
             w: MUL_64      ____,  T2.x,  PV33.x      
         35  x: MUL_64      ____,  T3.y,  PV34.y      
             y: MUL_64      ____,  T3.y,  PV34.y      
             z: MUL_64      ____,  T3.y,  PV34.y      
             w: MUL_64      ____,  T3.x,  PV34.x      
         36  x: MULADD_64   ____,  PV35.y,  (0xBFB00000, -1.375f).y,  (0x3FE80000, 1.8125f).x      
             y: MULADD_64   ____,  PV35.y,  (0xBFB00000, -1.375f).y,  (0x3FE80000, 1.8125f).x      
             z: MULADD_64   ____,  PV35.y,  (0xBFB00000, -1.375f).y,  (0x3FE80000, 1.8125f).x      
             w: MULADD_64   ____,  PV35.x,  0.0f,  0.0f      
         37  x: MUL_64      T2.x,  PV36.y,  T2.y      
             y: MUL_64      T2.y,  PV36.y,  T2.y      
             z: MUL_64      ____,  PV36.y,  T2.y      
             w: MUL_64      ____,  PV36.x,  T2.x      
         38  x: ADD_64      T3.x,  PV37.y,  KC0[1].y      
             y: ADD_64      T3.y,  PV37.x,  KC0[1].x      
         39  x: MUL_64      ____,  T2.y,  PV38.y      
             y: MUL_64      ____,  T2.y,  PV38.y      
             z: MUL_64      ____,  T2.y,  PV38.y      
             w: MUL_64      ____,  T2.x,  PV38.x      
         40  x: MUL_64      ____,  PV39.y,  T3.y      
             y: MUL_64      ____,  PV39.y,  T3.y      
             z: MUL_64      ____,  PV39.y,  T3.y      
             w: MUL_64      ____,  PV39.x,  T3.x      
         41  x: MUL_64      R2.x,  PV40.y,  T3.y      
             y: MUL_64      R2.y,  PV40.y,  T3.y      
             z: MUL_64      ____,  PV40.y,  T3.y      
             w: MUL_64      ____,  PV40.x,  T3.x      
    08 ALU: ADDR(168) CNT(127) KCACHE0(CB1:0-15) 
         42  x: MUL_64      ____,  R0.y,  KC0[0].y      
             y: MUL_64      ____,  R0.y,  KC0[0].y      
             z: MUL_64      ____,  R0.y,  KC0[0].y      
             w: MUL_64      ____,  R0.x,  KC0[0].x      
         43  x: MUL_64      R0.x,  PV42.y,  (0x3FF71547, 1.930336833f).y      
             y: MUL_64      R0.y,  PV42.y,  (0x3FF71547, 1.930336833f).y      
             z: MUL_64      ____,  PV42.y,  (0x3FF71547, 1.930336833f).y      
             w: MUL_64      ____,  PV42.x,  (0x652B82FE, 5.062131550e22f).x      
         44  x: F64_TO_F32  T3.x,  R2.y      
             y: F64_TO_F32  ____,  R2.x      
             z: FRACT_64    T2.z,  PV43.y      
             w: FRACT_64    T2.w,  PV43.x      
             t: MOV         R2.y, -R2.y      
         45  x: ADD_64      T0.x,  PV44.w,  (0xBFE00000, -1.75f).x      
             y: ADD_64      T0.y,  PV44.z,  0.0f      
             t: MOV         R0.y, -R0.y      
         46  x: MUL_64      T1.x,  PV45.y,  PV45.y      
             y: MUL_64      T1.y,  PV45.y,  PV45.y      
             z: MUL_64      ____,  PV45.y,  PV45.y      
             w: MUL_64      ____,  PV45.x,  PV45.x      
             t: RCP_FF      T1.z,  T3.x      
         47  x: MULADD_64   T3.x,  PV46.y,  (0x3EF52B5C, 0.4788464308f).w,  (0x3F84AA4E, 1.036447287f).y      
             y: MULADD_64   T3.y,  PV46.y,  (0x3EF52B5C, 0.4788464308f).w,  (0x3F84AA4E, 1.036447287f).y      
             z: MULADD_64   ____,  PV46.y,  (0x3EF52B5C, 0.4788464308f).w,  (0x3F84AA4E, 1.036447287f).y      
             w: MULADD_64   ____,  PV46.x,  (0x4D1F00B9, 166726544.0f).z,  (0xB649A98F, -0.000003005003009f).x      
         48  x: MULADD_64   T2.x,  T1.y,  (0x3E8657CD, 0.2623886168f).w,  (0x3F33185F, 0.6995906234f).y      
             y: MULADD_64   T2.y,  T1.y,  (0x3E8657CD, 0.2623886168f).w,  (0x3F33185F, 0.6995906234f).y      
             z: MULADD_64   ____,  T1.y,  (0x3E8657CD, 0.2623886168f).w,  (0x3F33185F, 0.6995906234f).y      
             w: MULADD_64   ____,  T1.x,  (0xD06316DC, -1.523970458e10f).z,  (0x478FF1EB, 73699.83594f).x      
         49  x: MULADD_64   T3.x,  T1.y,  T3.y,  (0x3FE62E42, 1.798286676f).y      
             y: MULADD_64   T3.y,  T1.y,  T3.y,  (0x3FE62E42, 1.798286676f).y      
             z: MULADD_64   ____,  T1.y,  T3.y,  (0x3FE62E42, 1.798286676f).y      
             w: MULADD_64   ____,  T1.x,  T3.x,  (0xFEFA39EF, -1.663039037e38f).x      
         50  x: MULADD_64   T2.x,  T1.y,  T2.y,  (0x3FABF3E7, 1.343380809f).y      
             y: MULADD_64   T2.y,  T1.y,  T2.y,  (0x3FABF3E7, 1.343380809f).y      
             z: MULADD_64   ____,  T1.y,  T2.y,  (0x3FABF3E7, 1.343380809f).y      
             w: MULADD_64   ____,  T1.x,  T2.x,  (0x389CEFF9, 0.00007483358058f).x      
         51  x: MUL_64      T0.x,  T0.y,  T3.y      
             y: MUL_64      T0.y,  T0.y,  T3.y      
             z: MUL_64      ____,  T0.y,  T3.y      
             w: MUL_64      ____,  T0.x,  T3.x      
         52  x: MULADD_64   ____,  T1.y,  T2.y,  (0x3FF00000, 1.875f).x      
             y: MULADD_64   ____,  T1.y,  T2.y,  (0x3FF00000, 1.875f).x      
             z: MULADD_64   ____,  T1.y,  T2.y,  (0x3FF00000, 1.875f).x      
             w: MULADD_64   ____,  T1.x,  T2.x,  0.0f      
         53  x: MULADD_64   T3.x,  T0.y,  (0xBFE00000, -1.75f).x,  PV52.y      
             y: MULADD_64   T3.y,  T0.y,  (0xBFE00000, -1.75f).x,  PV52.y      
             z: MULADD_64   ____,  T0.y,  (0xBFE00000, -1.75f).x,  PV52.y      
             w: MULADD_64   ____,  T0.x,  0.0f,  PV52.x      
         54  x: F64_TO_F32  ____,  PV53.y      
             y: F64_TO_F32  ____,  PV53.x      
             w: MOV         T3.w, -PV53.y      
         55  z: F32_TO_F64  T0.z,  T1.z      
             w: F32_TO_F64  T0.w,  0.0f      
             t: RCP_FF      ____,  PV54.x      
         56  z: F32_TO_F64  T1.z,  PS55      
             w: F32_TO_F64  T1.w,  0.0f      
         57  x: MULADD_64   ____,  T3.w,  PV56.w,  (0x3FF00000, 1.875f).x      
             y: MULADD_64   ____,  T3.w,  PV56.w,  (0x3FF00000, 1.875f).x      
             z: MULADD_64   ____,  T3.w,  PV56.w,  (0x3FF00000, 1.875f).x      
             w: MULADD_64   ____,  T3.x,  PV56.z,  0.0f      
         58  x: MULADD_64   T2.x,  T1.w,  PV57.y,  T1.w      
             y: MULADD_64   T2.y,  T1.w,  PV57.y,  T1.w      
             z: MULADD_64   ____,  T1.w,  PV57.y,  T1.w      
             w: MULADD_64   ____,  T1.z,  PV57.x,  T1.z      
         59  x: MULADD_64   T1.x,  R2.y,  T0.w,  (0x3FF00000, 1.875f).x      
             y: MULADD_64   T1.y,  R2.y,  T0.w,  (0x3FF00000, 1.875f).x      
             z: MULADD_64   ____,  R2.y,  T0.w,  (0x3FF00000, 1.875f).x      
             w: MULADD_64   ____,  R2.x,  T0.z,  0.0f      
         60  x: MUL_64      R1.x,  T0.y,  T2.y      
             y: MUL_64      R1.y,  T0.y,  T2.y      
             z: MUL_64      ____,  T0.y,  T2.y      
             w: MUL_64      ____,  T0.x,  T2.x      
         61  x: MULADD_64   T1.x,  T0.w,  T1.y,  T0.w      
             y: MULADD_64   T1.y,  T0.w,  T1.y,  T0.w      
             z: MULADD_64   ____,  T0.w,  T1.y,  T0.w      
             w: MULADD_64   ____,  T0.z,  T1.x,  T0.z      
         62  z: ADD_64      T2.z,  T2.w,  R0.y      
             w: ADD_64      T2.w,  T2.z,  R0.x      
         63  x: MULADD_64   ____,  T3.w,  R1.y,  T0.y      
             y: MULADD_64   ____,  T3.w,  R1.y,  T0.y      
             z: MULADD_64   ____,  T3.w,  R1.y,  T0.y      
             w: MULADD_64   ____,  T3.x,  R1.x,  T0.x      
         64  x: MULADD_64   T2.x,  PV63.y,  T2.y,  R1.y      
             y: MULADD_64   T2.y,  PV63.y,  T2.y,  R1.y      
             z: MULADD_64   ____,  PV63.y,  T2.y,  R1.y      
             w: MULADD_64   ____,  PV63.x,  T2.x,  R1.x      
         65  x: F64_TO_F32  ____,  T2.w      
             y: F64_TO_F32  ____,  T2.z      
         66  x: MUL_64      T3.x,  R3.w,  T1.y      
             y: MUL_64      T3.y,  R3.w,  T1.y      
             z: MUL_64      ____,  R3.w,  T1.y      
             w: MUL_64      ____,  R3.z,  T1.x      
             t: F_TO_I      T2.w, -PV65.x      
         67  x: MULADD_64   T0.x,  R2.y,  PV66.y,  R3.w      
             y: MULADD_64   T0.y,  R2.y,  PV66.y,  R3.w      
             z: MULADD_64   ____,  R2.y,  PV66.y,  R3.w      
             w: MULADD_64   ____,  R2.x,  PV66.x,  R3.z      
         68  x: MULADD_64   T2.x,  T2.y,  (0x3FF6A09E, 1.926776648f).y,  (0x3FF6A09E, 1.926776648f).y      
             y: MULADD_64   T2.y,  T2.y,  (0x3FF6A09E, 1.926776648f).y,  (0x3FF6A09E, 1.926776648f).y      
             z: MULADD_64   ____,  T2.y,  (0x3FF6A09E, 1.926776648f).y,  (0x3FF6A09E, 1.926776648f).y      
             w: MULADD_64   ____,  T2.x,  (0x667F3BCD, 3.013266457e23f).x,  (0x667F3BCD, 3.013266457e23f).x      
         69  x: MULADD_64   ____,  T0.y,  T1.y,  T3.y      
             y: MULADD_64   ____,  T0.y,  T1.y,  T3.y      
             z: MULADD_64   ____,  T0.y,  T1.y,  T3.y      
             w: MULADD_64   ____,  T0.x,  T1.x,  T3.x      
         70  x: LDEXP_64    R2.x,  T2.y,  T2.w      
             y: LDEXP_64    R2.y,  T2.x,  T2.w      
             z: ADD_64      R0.z,  R0.w,  PV69.y      
             w: ADD_64      R0.w,  R0.z,  PV69.x      
         71  x: MULADD_64   R10.x,  R2.y,  R3.w,  R10.y      
             y: MULADD_64   R10.y,  R2.y,  R3.w,  R10.y      
             z: MULADD_64   ____,  R2.y,  R3.w,  R10.y      
             w: MULADD_64   ____,  R2.x,  R3.z,  R10.x      
09 ENDLOOP i0 PASS_JUMP_ADDR(5) 
10 ALU: ADDR(295) CNT(36) 
     72  x: MUL_64      ____,  R0.w,  R9.y      
         y: MUL_64      ____,  R0.w,  R9.y      
         z: MUL_64      T0.z,  R0.w,  R9.y      
         w: MUL_64      T0.w,  R0.z,  R9.x      
     73  x: MUL_64      T0.x,  R10.y,  R9.y      
         y: MUL_64      T0.y,  R10.y,  R9.y      
         z: MUL_64      ____,  R10.y,  R9.y      
         w: MUL_64      ____,  R10.x,  R9.x      
     74  x: ADD_64      R1.x,  R4.y,  T0.w      
         y: ADD_64      R1.y,  R4.x,  T0.z      
         z: ADD_64      R0.z,  R5.y,  PV73.y      VEC_120 
         w: ADD_64      R0.w,  R5.x,  PV73.x      VEC_120 
     75  x: MOV         ____,  PV74.z      
         y: MOV         ____, -PV74.w      
         z: MOV         ____,  PV74.x      
         t: MOV         ____, -PV74.y      
     76  x: ADD_64      ____,  PV75.y,  R5.y      
         y: ADD_64      ____,  PV75.x,  R5.x      
         z: ADD_64      ____,  PS75,  R4.y      VEC_021 
         w: ADD_64      ____,  PV75.z,  R4.x      VEC_021 
     77  x: ADD_64      ____,  PV76.y,  T0.y      
         y: ADD_64      ____,  PV76.x,  T0.x      
         z: ADD_64      ____,  PV76.w,  T0.w      
         w: ADD_64      ____,  PV76.z,  T0.z      
     78  x: ADD_64      R0.x,  R5.w,  PV77.y      
         y: ADD_64      R0.y,  R5.z,  PV77.x      
         z: ADD_64      R1.z,  R4.w,  PV77.w      VEC_120 
         w: ADD_64      R1.w,  R4.z,  PV77.z      VEC_120 
     79  x: MOV         R3.x,  R0.z      
         y: MOV         R3.y,  R0.w      
         z: MOV         R3.z,  PV78.x      
         w: MOV         R3.w,  PV78.y      
     80  x: MOV         R2.x,  R1.x      
         y: MOV         R2.y,  R1.y      
         z: MOV         R2.z,  R1.z      
         w: MOV         R2.w,  R1.w      
11 EXP_DONE: PIX0, R2  BRSTCNT(1) 
END_OF_PROGRAM
 
Last edited by a moderator:
Those should all be the same on rv7xx and rv8xx though IIRC, so wouldn't explain why rv8xx is faster.
Evergreens have a bitextract instruction, which can be used to do shifts of data larger than 32bits more efficiently, i.e. at double the speed (the latency is halved) and with a third of the instructions (so in corner cases it may have 3x the throughput of an older GPU with the same number of units).

That is actually the reason, why Collatz@home (which does shifts on 192bit integers) is faster on Juniper than on a HD4800 card. But effectively Evergreens are limited by the 32bit integer multiplication speed there (and eventually memory bandwidth). Evergreens could go even faster, if AMD would enable the 24bit integer MUL in IL (it is not accessible to programmers right now, or has this changed lately?). In principle it should be possible that the 4 xyzw ALUs are chained together (with the same data paths used for the single cycle dot4) to deliver two 32bit integer multiplications per cycle and VLIW unit (one from the 4 xyzw ALUs and the other one from the 32bit multiplier in the t unit). There was even a presentation mentioning this possibility.
 
Last edited by a moderator:
Update: VR-ZONE

Sorry, it should be 480 SP. My source got a little mixed up. They are too busy making the cards right now :D I heard the retail availability got delayed but the launch still stands on March 26th for now. NV will decide tonight whether to push back the launch date. There will be enough cards to go around i heard.

Updated with final clocks too :

GeForce GTX 480 : 480 SP, 700/1401/1848MHz core/shader/mem, 384-bit, 1536MB, 295W TDP, US$499

GeForce GTX 470 : 448 SP, 607/1215/1674MHz core/shader/mem, 320-bit, 1280MB, 225W TDP, US$349
http://forums.vr-zone.com/7790889-post30.html
 
If widespread availability of GF100 awaits viable B silicon, and that is some six months away, didn't that put Nvidia a full year behind on their roadmap across the boards? And give AMD a free year to catch up in the pro market?
 

just a try:
Code:
	GTX 480	GTX 470	470->480	470->480	480->470	
			%	(% less)	(% more)	
sps	480	448	93,33%	6,67%	7,14%	(= % tmu units)
core	700	607	86,71%	13,29%	15,32%	(= % rop speed)
shader	1401	1215	86,72%	13,28%	15,31%	(= %  tmu speed)
mem	1848	1674	90,58%	9,42%	10,39%	
bus width	384	320	83,33%	16,67%	20,00%	(= %  rop units)
mb	1536	1280	83,33%	16,67%	20,00%	
tdp	295	225	76,27%	23,73%	31,11%	

pixel fillrate		72,26%	27,74%	38,39%	
texture fillrate		80,94%	19,06%	23,54%	
flops		         80,94%	19,06%	23,54%	
bandwidth		         75,49%	24,51%	32,47%	
	"average"	         77,41%	22,59%	29,49%
 
Back
Top