AMD: R9xx Speculation

Why only 16 writes? I see a DS_INST_WRITE2 opcode in the ISA.
Haven't really looked at that before, it seems the two write addresses are linked by the specified offset, so it's not two general purpose writes.

Jawed
 
But those are structured differently from the multiplies in the ALU units. As you mentioned to me, you can do two multiplies in series in the other ALUs, so the first multiplication starts very soon. In the T units, you have to get the LUT value and you need to do a partial square and cube before starting the multiply, the LUT coefficients don't need full IEEE to do those multiplies, and don't need another multiply after that (so they are further down the pipeline than in the other ALUs). Note how the other ALUs don't let you do an add after two serial muls, so that's not a viable path to accommodate the square/cube.

The structure is totally different. Look at Fig. 7 in the patent you linked to.
The only observation I'm making is that in the current T this sequence of multiplies and adds is possible. The longest chain of instructions is 3 multiplies and add.


Looking at the 3 multiplies in sequence, I can think of 2 reasons why the timing could be tight:
  1. reduced bitness (i.e. very low latency across any multiplier, being relatively small in area) since these are 16-bit, 12-bit and 12-bit
  2. the fact that these multiplies are successive (and specially formatted) means that all the conventional "in-between" stuff between successive instructions in a conventional pipeline are not required: these multiplies are cheek-by-jowl in time
Now it might be the case that these two reasons, alone, are enough to invalidate the possibility of emulating this in the "bigger, more conventional" ALU of a VLIW XYZW configuration.

Or both of these factors don't change appreciably when serialising multiplies across the lanes of the VLIW ALU.

But the point is: I don't know.

It still needs a cycle. The squaring and cubing probably need two.
Don't forget in the existing T the pipeline's 8 cycles need to produce a normalised, floating-point, result. The add isn't the end of the story. The timing is super-tight. If you count 2 cycles per multiply then there's not enough time to do all of: pre-processing, LUTs, add and the conversion.

What makes you think the T-unit idling is responsible for the last 20-25%?
I never said it was.

Curiously there's a GPU flag for whether instructions should issue to T or to one of XYZW when there is equal choice between the two.

Or that the other ALUs are idling while the T-unit is working?
This is normal.

You should spend some time looking at the way code actually runs on the hardware.

It absolutely will cost performance.
I didn't say it wouldn't cost performance. I said the typical best-case utilisation of shaders is such that it will be unaffected.

Shaders such as Rightmark 4.0's Mineral shader (Fire is too long to post :cry: ):

Code:
; --------  Disassembly --------------------
00 ALU: ADDR(32) CNT(124) KCACHE0(CB1:0-15) KCACHE1(CB0:0-15) 
      0  x: INTERP_XY   R22.x,  R0.y,  Param0.x      VEC_210 
         y: INTERP_XY   R22.y,  R0.x,  Param0.x      VEC_210 
         z: INTERP_XY   ____,  R0.y,  Param0.x      VEC_210 
         w: INTERP_XY   ____,  R0.x,  Param0.x      VEC_210 
      1  x: MUL_e       ____,  PV0.x,  (0x43000000, 128.0f).x      
         y: MUL_e       ____,  PV0.x,  (0x42800000, 64.0f).y      
         z: MUL_e       ____,  PV0.y,  (0x42800000, 64.0f).y      
         w: MUL_e       ____,  PV0.y,  (0x43000000, 128.0f).x      
         t: MOV*2       R24.z,  KC0[0].z      
      2  x: FLOOR       ____,  PV1.z      
         y: FLOOR       ____,  PV1.y      
         z: FLOOR       ____,  PV1.x      
         w: FLOOR       ____,  PV1.w      
         t: MUL_e       R46.z,  R22.x,  (0x42F00000, 120.0f).x      
      3  x: MULADD_e    T1.x,  PV2.x,  (0x3C800000, 0.015625f).x,  KC1[0].y      
         y: MULADD_e    ____,  PV2.y,  (0x3C800000, 0.015625f).x,  KC1[0].x      
         z: MULADD_e    T2.z,  PV2.w,  (0x3C800000, 0.015625f).x,  KC1[0].y      
         w: MULADD_e    T1.w,  PV2.z,  (0x3C800000, 0.015625f).x,  KC1[0].x      
         t: MOV         R53.w,  (0x3F800000, 1.0f).y      
      4  x: ADD         T3.x,  PV3.y,  KC1[1].x      
         y: MUL_e       T0.y,  PV3.w,  (0x49742400, 1000000.0f).x      
         z: MUL_e       ____,  PV3.y,  (0x49742400, 1000000.0f).x      
         w: ADD         R2.w,  PV3.w,  KC1[1].x      
         t: ADD         T2.x,  R22.x, -PV3.y      
      5  x: ADD         R8.x,  PV4.x,  KC1[2].x      
         y: MUL_e       T1.y,  PV4.x,  (0x49742400, 1000000.0f).x      
         z: MUL_e       T0.z,  PV4.w,  (0x49742400, 1000000.0f).x      
         w: ADD         R4.w,  PV4.w,  KC1[2].x      
         t: F_TO_I      T2.y,  PV4.z      
      6  x: ADD         R11.x,  PV5.x,  KC1[3].x      
         y: MUL_e       T0.y,  PV5.w,  (0x49742400, 1000000.0f).x      
         z: MUL_e       T1.z,  PV5.x,  (0x49742400, 1000000.0f).x      
         w: ADD         R6.w,  PV5.w,  KC1[3].x      
         t: F_TO_I      T0.w,  T0.y      
      7  x: ADD         R12.x,  PV6.x,  KC1[4].x      
         y: MUL_e       T3.y,  PV6.x,  (0x49742400, 1000000.0f).x      
         z: MUL_e       T3.z,  PV6.w,  (0x49742400, 1000000.0f).x      
         w: ADD         R7.w,  PV6.w,  KC1[4].x      
         t: MULLO_INT   ____,  T2.y,  T2.y      
      8  x: ADD         R13.x,  PV7.x,  KC1[5].x      
         y: MUL_e       R2.y,  PV7.w,  (0x49742400, 1000000.0f).x      
         z: MUL_e       R0.z,  PV7.x,  (0x49742400, 1000000.0f).x      
         w: ADD_INT     ____,  T2.y,  PS7      
         t: F_TO_I      T0.x,  T1.y      
      9  x: MUL_e       R7.x,  PV8.x,  (0x49742400, 1000000.0f).x      
         y: ADD         R8.y,  R7.w,  KC1[5].x      
         z: ASHR        ____,  PV8.w,  (0x00000010, 2.242077543e-44f).y      
         w: ADD         R11.w,  PV8.x,  KC1[6].x      
         t: MULLO_INT   ____,  T0.w,  T0.w      
     10  x: AND_INT     R4.x,  PV9.z,  (0x00007FFF, 4.591634678e-41f).x      
         y: ADD_INT     T1.y,  T0.w,  PS9      
         z: MUL_e       R3.z,  PV9.y,  (0x49742400, 1000000.0f).y      
         w: ADD         R13.w,  PV9.y,  KC1[6].x      
         t: F_TO_I      T2.y,  T0.z      
     11  x: INTERP_ZW   ____,  R0.y,  Param0.x      VEC_210 
         y: INTERP_ZW   ____,  R0.x,  Param0.x      VEC_210 
         z: INTERP_ZW   R47.z,  R0.y,  Param0.x      VEC_210 
         w: INTERP_ZW   ____,  R0.x,  Param0.x      VEC_210 
     12  x: MUL_e       ____,  PV11.z,  (0x43000000, 128.0f).x      
         y: ASHR        ____,  T1.y,  (0x00000010, 2.242077543e-44f).y      
         z: ADD         T0.z,  R22.y, -T1.x      VEC_102 
         w: MUL_e       ____,  PV11.z,  (0x42800000, 64.0f).z      
         t: MULLO_INT   ____,  T0.x,  T0.x      
     13  x: FLOOR       T0.x,  PV12.x      
         y: AND_INT     R4.y,  PV12.y,  (0x00007FFF, 4.591634678e-41f).x      
         z: ADD_INT     ____,  T0.x,  PS12      
         w: FLOOR       ____,  PV12.w      
         t: F_TO_I      T1.y,  T1.z      
     14  x: ADD         T1.x,  T1.x,  KC1[1].y      
         y: ASHR        ____,  PV13.z,  (0x00000010, 2.242077543e-44f).x      
         z: MUL_e       R4.z,  R11.w,  (0x49742400, 1000000.0f).y      
         w: MULADD_e    T2.w,  PV13.w,  (0x3C800000, 0.015625f).z,  KC1[0].z      
         t: MULLO_INT   ____,  T2.y,  T2.y      
     15  x: AND_INT     R9.x,  PV14.y,  (0x00007FFF, 4.591634678e-41f).x      
         y: ADD         R21.y,  R11.w,  KC1[7].x      
         z: MULADD_e    R1.z,  T0.x,  (0x3C800000, 0.015625f).y,  KC1[0].z      
         w: ADD_INT     ____,  T2.y,  PS14      
         t: F_TO_I      T0.w,  T0.y      
     16  x: ADD         R2.x,  T2.z,  KC1[1].y      
         y: ADD         R1.y,  R22.y, -T2.z      
         z: ASHR        ____,  PV15.w,  (0x00000010, 2.242077543e-44f).x      
         w: ADD         R1.w,  R22.x, -T1.w      
         t: MULLO_INT   ____,  T1.y,  T1.y      
     17  x: MUL_e       R10.x,  R13.w,  (0x49742400, 1000000.0f).x      
         y: AND_INT     R6.y,  PV16.z,  (0x00007FFF, 4.591634678e-41f).y      
         z: ADD         R21.z,  R13.w,  KC1[7].x      
         w: ADD_INT     ____,  T1.y,  PS16      
         t: F_TO_I      R1.x,  T3.y      
     18  x: ADD         R5.x,  R47.z, -T2.w      VEC_102 
         y: MUL_e       R31.y,  T2.x,  (0x42800000, 64.0f).x      
         z: ASHR        T0.z,  PV17.w,  (0x00000010, 2.242077543e-44f).y      
         w: MUL_e       R23.w,  T0.z,  (0x42800000, 64.0f).x      
         t: MULLO_INT   ____,  T0.w,  T0.w      
     19  x: ADD         R6.x,  R22.x, -T3.x      
         y: ADD_INT     ____,  T0.w,  PS18      
         z: ADD         R2.z,  T2.w,  KC1[1].z      VEC_120 
         w: ADD         R3.w,  R22.y, -T1.x      VEC_021 
         t: F_TO_I      R3.y,  T3.z      
     20  x: ASHR        R3.x,  PV19.y,  (0x00000010, 2.242077543e-44f).x      
         y: ADD         R5.y,  T1.x,  KC1[2].y      
         z: AND_INT     R5.z,  T0.z,  (0x00007FFF, 4.591634678e-41f).y      
         w: MUL_e       R5.w,  R21.y,  (0x49742400, 1000000.0f).z      
         t: MULLO_INT   R0.w,  R1.x,  R1.x      
01 ALU: ADDR(156) CNT(125) KCACHE0(CB0:0-15) KCACHE1(CB1:0-15) 
     21  x: ADD         T1.x,  R47.z, -R1.z      
         y: MUL_e       R32.y,  R1.w,  (0x42800000, 64.0f).x      
         z: ADD_INT     ____,  R1.x,  R0.w      
         w: MUL_e       R25.w,  R1.y,  (0x42800000, 64.0f).x      
         t: F_TO_I      T1.y,  R0.z      
     22  x: ADD         T3.x,  R22.x, -R2.w      
         y: ASHR        T3.y,  PV21.z,  (0x00000010, 2.242077543e-44f).x      
         z: ADD         T0.z,  R22.y, -R2.x      
         w: ADD         T2.w,  R1.z,  KC0[1].z      
         t: MULLO_INT   ____,  R3.y,  R3.y      
     23  x: AND_INT     T5.x,  R3.x,  (0x00007FFF, 4.591634678e-41f).x      
         y: MUL_e       T4.y,  R21.z,  (0x49742400, 1000000.0f).y      
         z: ADD         T3.z,  R2.x,  KC0[2].y      VEC_120 
         w: ADD_INT     ____,  R3.y,  PS22      
         t: F_TO_I      T0.w,  R2.y      
     24  x: MIN_DX10    R3.x,  |R23.w|,  1.0f      
         y: MIN_DX10    R1.y,  |R31.y|,  1.0f      
         z: ASHR        T2.z,  PV23.w,  (0x00000010, 2.242077543e-44f).x      
         w: MUL_e       R24.w,  R5.x,  (0x42800000, 64.0f).y      
         t: U_TO_F      ____,  R4.x      
     25  x: ADD         T0.x,  R47.z, -R2.z      
         y: MUL_e       R36.y,  R6.x,  (0x42800000, 64.0f).x      
         z: MUL_e       R25.z,  R3.w,  (0x42800000, 64.0f).x      
         w: MUL_e       T1.w,  PS24,  (0x38000100, 0.00003051850945f).y      
         t: MULLO_INT   ____,  T1.y,  T1.y      
     26  x: ADD         T4.x,  R2.z,  KC0[2].z      
         y: ADD         T0.y,  R22.x, -R8.x      
         z: MUL_e       R13.z,  R24.z,  PV25.w      VEC_120 
         w: ADD_INT     ____,  T1.y,  PS25      
         t: F_TO_I      T2.x,  R7.x      
     27  x: ADD         T1.x,  R5.y,  KC0[3].y      
         y: MUL_e       R33.y,  T1.x,  (0x42800000, 64.0f).x      
         z: ASHR        T1.z,  PV26.w,  (0x00000010, 2.242077543e-44f).y      
         w: ADD         T3.w,  R22.y, -R5.y      VEC_102 
         t: U_TO_F      ____,  R4.y      
     28  x: AND_INT     R7.x,  T3.y,  (0x00007FFF, 4.591634678e-41f).x      
         y: MUL_e       T2.y,  PS27,  (0x38000100, 0.00003051850945f).y      
         z: MIN_DX10    R1.z,  |R32.y|,  1.0f      VEC_120 
         w: MIN_DX10    R1.w,  |R25.w|,  1.0f      
         t: MULLO_INT   ____,  T0.w,  T0.w      
     29  x: MUL_e       R14.x,  R24.z,  PV28.y      
         y: ADD_INT     T3.y,  T0.w,  PS28      
         z: MUL_e       R26.z,  T3.x,  (0x42800000, 64.0f).x      
         w: ADD         T0.w,  R47.z, -T2.w      VEC_120 
         t: F_TO_I      T1.y,  R3.z      
     30  x: ADD         T3.x,  T2.w,  KC0[2].z      
         y: MUL_e       R37.y,  T0.z,  (0x42800000, 64.0f).x      
         z: ADD         T0.z,  R22.y, -T3.z      
         w: ADD         T2.w,  R22.x, -R4.w      
         t: U_TO_F      ____,  R9.x      
     31  x: ADD         R9.x,  T3.z,  KC0[3].y      
         y: AND_INT     R4.y,  T2.z,  (0x00007FFF, 4.591634678e-41f).x      VEC_120 
         z: MUL_e       R3.z,  PS30,  (0x38000100, 0.00003051850945f).y      
         w: ASHR        R4.w,  T3.y,  (0x00000010, 2.242077543e-44f).z      
         t: MULLO_INT   ____,  T2.x,  T2.x      
     32  x: MUL         R2.x,  R13.z,  (0x3E22F983, 0.1591549367f).x      
         y: MUL_e       R20.y,  KC1[0].z,  T1.w      
         z: ADD_INT     T2.z,  T2.x,  PS31      
         w: MIN_DX10    R9.w,  |R24.w|,  1.0f      
         t: F_TO_I      T3.y,  R4.z      
     33  x: MUL_e       R17.x,  R24.z,  R3.z      
         y: MUL_e       R10.y,  R3.x,  R3.x      
         z: MUL_e       R7.z,  R1.y,  R1.y      
         w: MUL_e       R27.w,  T0.x,  (0x42800000, 64.0f).x      VEC_120 
         t: U_TO_F      ____,  R6.y      
     34  x: ADD         R8.x,  R47.z, -T4.x      
         y: MIN_DX10    R13.y,  |R25.z|,  1.0f      VEC_120 
         z: MIN_DX10    R0.z,  |R36.y|,  1.0f      
         w: MUL_e       R0.w,  PS33,  (0x38000100, 0.00003051850945f).x      
         t: MULLO_INT   ____,  T1.y,  T1.y      
     35  x: ADD         R6.x,  T4.x,  KC0[3].z      
         y: MUL_e       R39.y,  T0.y,  (0x42800000, 64.0f).x      
         z: MUL_e       R28.z,  T3.w,  (0x42800000, 64.0f).x      
         w: ADD_INT     R3.w,  T1.y,  PS34      VEC_120 
         t: F_TO_I      T3.w,  R10.x      
     36  x: ADD         R10.x,  R22.x, -R11.x      
         y: ADD         R5.y,  R22.y, -T1.x      VEC_021 
         z: ADD         R5.z,  T1.x,  KC0[4].y      VEC_201 
         w: AND_INT     R2.w,  T1.z,  (0x00007FFF, 4.591634678e-41f).x      
         t: U_TO_F      ____,  R5.z      
     37  x: MUL_e       R18.x,  KC1[0].z,  T2.y      VEC_102 
         y: ASHR        R2.y,  T2.z,  (0x00000010, 2.242077543e-44f).x      
         z: MUL         R2.z,  R14.x,  (0x3E22F983, 0.1591549367f).y      
         w: MUL_e       R8.w,  PS36,  (0x38000100, 0.00003051850945f).z      
         t: MULLO_INT   ____,  T3.y,  T3.y      
     38  x: MIN_DX10    R1.x,  |R33.y|,  1.0f      
         y: MUL_e       R11.y,  R1.z,  R1.z      
         z: MUL_e       R11.z,  R1.w,  R1.w      
         w: ADD_INT     R5.w,  T3.y,  PS37      VEC_120 
         t: F_TO_I      R11.x,  R5.w      
     39  x: MUL_e       R23.x,  T0.w,  (0x42800000, 64.0f).x      
         y: MIN_DX10    R12.y,  |R26.z|,  1.0f      
         z: MUL_e       R20.z,  R24.z,  R0.w      VEC_120 
         w: MIN_DX10    R10.w,  |R37.y|,  1.0f      
         t: U_TO_F      ____,  T5.x      
     40  x: ADD         R4.x,  R47.z, -T3.x      
         y: MUL_e       R7.y,  PS39,  (0x38000100, 0.00003051850945f).x      
         z: MUL_e       R29.z,  T2.w,  (0x42800000, 64.0f).y      
         w: MUL_e       R28.w,  T0.z,  (0x42800000, 64.0f).y      VEC_120 
         t: MULLO_INT   ____,  T3.w,  T3.w      
     41  x: ADD         R5.x,  R22.x, -R6.w      
         y: ADD_INT     R3.y,  T3.w,  PS40      
         z: ADD         R4.z,  R22.y, -R9.x      
         w: ADD         R6.w,  T3.x,  KC0[3].z      VEC_201 
         t: F_TO_I      R6.y,  T4.y      
02 ALU: ADDR(281) CNT(125) KCACHE0(CB0:0-15) KCACHE1(CB1:0-15) 
     42  x: ASHR        T5.x,  R3.w,  (0x00000010, 2.242077543e-44f).x      
         y: SETGT       T4.y,  |R2.x|,  (0x42480000, 50.0f).y      
         z: ADD         T0.z,  R9.x,  KC0[4].y      VEC_120 
         w: AND_INT     T2.w,  R4.w,  (0x00007FFF, 4.591634678e-41f).z      VEC_120 
         t: U_TO_F      T3.w,  R7.x      
     43  x: MUL_e       R9.x,  R9.w,  R9.w      
         y: MUL         R9.y,  R20.y,  (0x3E22F983, 0.1591549367f).x      
         z: FRACT       T2.z,  R2.x      
         w: MUL_e       T0.w,  R1.y,  R7.z      VEC_120 
         t: MULLO_INT   ____,  R11.x,  R11.x      
     44  x: MUL_e       R21.x,  KC1[0].z,  R3.z      
         y: MUL         R4.y,  R17.x,  (0x3E22F983, 0.1591549367f).x      
         z: ADD_INT     T4.z,  R11.x,  PS43      VEC_120 
         w: MUL_e       T1.w,  R3.x,  R10.y      VEC_201 
         t: U_TO_F      T3.x,  R4.y      
     45  x: MIN_DX10    R15.x,  |R27.w|,  1.0f      
         y: MUL_e       R28.y,  R24.z,  R8.w      
         z: MUL_e       R17.z,  R13.y,  R13.y      
         w: MUL_e       R15.w,  R0.z,  R0.z      VEC_120 
         t: MULLO_INT   ____,  R6.y,  R6.y      
     46  x: MUL_e       R28.x,  R8.x,  (0x42800000, 64.0f).x      
         y: MIN_DX10    R17.y,  |R39.y|,  1.0f      
         z: MIN_DX10    R18.z,  |R28.z|,  1.0f      
         w: ADD_INT     R3.w,  R6.y,  PS45      VEC_120 
         t: MUL_e       R14.z,  T3.w,  (0x38000100, 0.00003051850945f).y      
     47  x: ADD         T2.x,  R47.z, -R6.x      
         y: MUL_e       R42.y,  R10.x,  (0x42800000, 64.0f).x      
         z: MUL_e       R34.z,  R5.y,  (0x42800000, 64.0f).x      
         w: ADD         T4.w,  R6.x,  KC0[4].z      VEC_120 
         t: U_TO_F      T6.x,  R2.w      
     48  x: ADD         T7.x,  R22.x, -R12.x      
         y: ADD         T2.y,  R5.z,  KC0[5].y      
         z: AND_INT     T1.z,  R2.y,  (0x00007FFF, 4.591634678e-41f).x      
         w: ADD         T3.w,  R22.y, -R5.z      VEC_120 
         t: ASHR        T3.z,  R5.w,  (0x00000010, 2.242077543e-44f).y      
     49  x: FRACT       T1.x,  R2.z      
         y: SETGT       T3.y,  |R2.z|,  (0x42480000, 50.0f).x      
         z: MUL_e       R19.z,  R1.x,  R1.x      
         w: MUL         R5.w,  R18.x,  (0x3E22F983, 0.1591549367f).y      VEC_120 
         t: MUL_e       T0.x,  R1.z,  R11.y      
     50  x: MUL_e       T4.x,  R1.w,  R11.z      
         y: MUL_e       R29.y,  KC1[0].z,  R0.w      
         z: MUL         R3.z,  R20.z,  (0x3E22F983, 0.1591549367f).x      
         w: MIN_DX10    R16.w,  |R23.x|,  1.0f      
         t: MUL_e       R18.y,  R12.y,  R12.y      
     51  x: MUL_e       R16.x,  R10.w,  R10.w      
         y: MUL_e       R30.y,  R24.z,  R7.y      
         z: MUL_e       R30.z,  R4.x,  (0x42800000, 64.0f).x      
         w: MIN_DX10    R17.w,  |R29.z|,  1.0f      VEC_120 
         t: MIN_DX10    R19.y,  |R28.w|,  1.0f      
     52  x: ADD         R5.x,  R47.z, -R6.w      
         y: MUL_e       R43.y,  R5.x,  (0x42800000, 64.0f).x      
         z: MUL_e       R35.z,  R4.z,  (0x42800000, 64.0f).x      VEC_120 
         w: MUL_e       R12.w,  T3.x,  (0x38000100, 0.00003051850945f).y      VEC_120 
         t: ADD         R5.y,  R6.w,  KC0[4].z      
     53  x: ADD         R4.x,  T0.z,  KC0[5].y      
         y: ADD         R2.y,  R22.y, -T0.z      
         z: AND_INT     R5.z,  T5.x,  (0x00007FFF, 4.591634678e-41f).x      
         w: ADD         R7.w,  R22.x, -R7.w      VEC_102 
         t: U_TO_F      R4.z,  T2.w      
     54  x: ASHR        R2.x,  R3.y,  (0x00000010, 2.242077543e-44f).x      
         y: CNDE_INT    R16.y,  T4.y,  R2.x,  T2.z      VEC_120 
         z: MULADD_e    R12.z,  R11.z,  R1.w,  T4.x      VEC_201 
         w: CNDE_INT    R19.w,  T3.y,  R2.z,  T1.x      VEC_210 
         t: MUL_e       R34.y,  KC1[0].z,  R8.w      
     55  x: MULADD_e    R6.x,  R10.y,  R3.x,  T1.w      
         y: MUL_e       R3.y,  R9.w,  R9.x      VEC_102 
         z: MULADD_e    R8.z,  R7.z,  R1.y,  T0.w      VEC_120 
         w: MULADD_e    R6.w,  R11.y,  R1.z,  T0.x      VEC_102 
         t: MUL_e       R24.y,  R18.z,  R18.z      VEC_102 
     56  x: SETGT       R3.x,  |R9.y|,  (0x42480000, 50.0f).x      
         y: FRACT       R1.y,  R4.y      VEC_120 
         z: SETGT       R1.z,  |R4.y|,  (0x42480000, 50.0f).x      VEC_120 
         w: FRACT       R1.w,  R9.y      
         t: MUL         R7.x,  R21.x,  (0x3E22F983, 0.1591549367f).y      
     57  x: MUL_e       R12.x,  R0.z,  R15.w      
         y: MUL_e       R23.y,  R15.x,  R15.x      
         z: MUL_e       R9.z,  R13.y,  R17.z      
         w: MUL         R14.w,  R28.y,  (0x3E22F983, 0.1591549367f).x      VEC_120 
         t: MIN_DX10    R21.w,  |R28.x|,  1.0f      
     58  x: MIN_DX10    R19.x,  |R42.y|,  1.0f      
         y: MUL_e       R35.y,  R24.z,  R14.z      
         z: MUL_e       R22.z,  R17.y,  R17.y      VEC_120 
         w: MUL_e       R29.w,  T2.x,  (0x42800000, 64.0f).x      
         t: MIN_DX10    R25.y,  |R34.z|,  1.0f      
     59  x: ADD         R8.x,  R47.z, -T4.w      
         y: MUL_e       R45.y,  T7.x,  (0x42800000, 64.0f).x      
         z: MUL_e       R38.z,  T3.w,  (0x42800000, 64.0f).x      
         w: MUL_e       R18.w,  T6.x,  (0x38000100, 0.00003051850945f).y      VEC_120 
         t: ADD         R15.z,  T4.w,  KC0[5].z      
     60  x: ADD         R11.x,  R22.x, -R13.x      
         y: ADD         R6.y,  R22.y, -T2.y      
         z: ADD         R16.z,  T2.y,  KC0[6].y      VEC_120 
         w: AND_INT     R4.w,  T3.z,  (0x00007FFF, 4.591634678e-41f).x      
         t: U_TO_F      R2.w,  T1.z      
     61  x: FRACT       R13.x,  R5.w      
         y: ASHR        R14.y,  T4.z,  (0x00000010, 2.242077543e-44f).x      
         z: SETGT       R2.z,  |R5.w|,  (0x42480000, 50.0f).y      
         w: MUL_e       R0.w,  R1.x,  R19.z      
         t: SETGT       R8.w,  |R3.z|,  (0x42480000, 50.0f).y      
     62  x: MUL_e       R10.x,  R12.y,  R18.y      
         y: MUL         R15.y,  R29.y,  (0x3E22F983, 0.1591549367f).x      VEC_201 
         z: FRACT       R6.z,  R3.z      
         w: MUL_e       R22.w,  R16.w,  R16.w      
         t: MUL_e       R10.z,  R10.w,  R16.x      
03 ALU: ADDR(406) CNT(125) KCACHE0(CB1:0-15) KCACHE1(CB0:0-15) 
     63  x: MIN_DX10    R20.x,  |R30.z|,  1.0f      
         y: MUL_e       R26.y,  R17.w,  R17.w      
         z: MUL         T2.z,  R30.y,  (0x3E22F983, 0.1591549367f).x      
         w: MUL_e       R26.w,  KC0[0].z,  R7.y      
         t: MUL_e       R23.z,  R19.y,  R19.y      VEC_102 
     64  x: MUL_e       R31.x,  R5.x,  (0x42800000, 64.0f).x      
         y: MIN_DX10    R27.y,  |R43.y|,  1.0f      
         z: MUL_e       R4.z,  R24.z,  R12.w      
         w: MIN_DX10    R20.w,  |R35.z|,  1.0f      VEC_120 
         t: MUL_e       R7.y,  R4.z,  (0x38000100, 0.00003051850945f).y      
     65  x: ADD         T6.x,  R47.z, -R5.y      
         y: MUL_e       R46.y,  R7.w,  (0x42800000, 64.0f).x      
         z: MUL_e       R39.z,  R2.y,  (0x42800000, 64.0f).x      
         w: ADD         T3.w,  R5.y,  KC1[5].z      VEC_120 
         t: U_TO_F      T7.x,  R5.z      
     66  x: ADD         ____,  R22.x, -R8.y      
         y: ADD         T2.y,  R4.x,  KC1[6].y      VEC_120 
         z: ADD         T4.z,  R22.y, -R4.x      
         w: AND_INT     T4.w,  R2.x,  (0x00007FFF, 4.591634678e-41f).x      VEC_201 
         t: ASHR        T1.z,  R3.w,  (0x00000010, 2.242077543e-44f).y      
     67  x: CNDE_INT    R12.x,  R1.z,  R4.y,  R1.y      
         y: CNDE_INT    R1.y,  R2.z,  R5.w,  R13.x      VEC_120 
         z: CNDE_INT    R0.z,  R3.x,  R9.y,  R1.w      VEC_201 
         w: MULADD_e    T7.w,  R15.w,  R0.z,  R12.x      VEC_021 
         t: MUL_e       R32.x,  PV66.x,  (0x42800000, 64.0f).x      
     68  x: MULADD_e    R6.x,  R10.y,  (0x40400000, 3.0f).x, -R6.x      VEC_021 
         y: CNDE_INT    R10.y,  R8.w,  R3.z,  R6.z      
         z: MULADD_e    T5.z,  R9.x,  R9.w,  R3.y      
         w: MULADD_e    R6.w,  R11.y,  (0x40400000, 3.0f).x, -R6.w      VEC_102 
         t: SETGT       T0.w,  |R7.x|,  (0x42480000, 50.0f).y      
     69  x: MULADD_e    T5.x,  R18.y,  R12.y,  R10.x      
         y: MULADD_e    R12.y,  R7.z,  (0x40400000, 3.0f).x, -R8.z      VEC_021 
         z: MULADD_e    R7.z,  R19.z,  R1.x,  R0.w      VEC_201 
         w: FRACT       T1.w,  R7.x      VEC_120 
         t: SETGT       T5.w,  |R14.w|,  (0x42480000, 50.0f).y      
     70  x: MUL_e       T1.x,  R15.x,  R23.y      
         y: MULADD_e    R13.y,  R16.x,  R10.w,  R10.z      VEC_120 
         z: FRACT       T0.z,  R14.w      
         w: MULADD_e    T6.w,  R17.z,  R13.y,  R9.z      VEC_102 
         t: MUL         R3.y,  R34.y,  (0x3E22F983, 0.1591549367f).x      
     71  x: CNDE        T3.x,  R13.z,  R13.z,  R16.y      
         y: MUL_e       R11.y,  R21.w,  R21.w      
         z: MULADD_e    R11.z,  R11.z,  (0x40400000, 3.0f).x, -R12.z      VEC_102 
         w: MUL         R0.w,  R35.y,  (0x3E22F983, 0.1591549367f).y      
         t: MIN_DX10    R13.x,  |R29.w|,  1.0f      
     72  x: MUL_e       T4.x,  R18.z,  R24.y      
         y: MUL_e       R5.y,  KC0[0].z,  R14.z      
         z: MUL_e       R3.z,  R19.x,  R19.x      
         w: MUL_e       R10.w,  R17.y,  R22.z      VEC_021 
         t: MUL_e       R6.z,  R25.y,  R25.y      VEC_102 
     73  x: MIN_DX10    R4.x,  |R38.z|,  1.0f      
         y: MIN_DX10    R4.y,  |R45.y|,  1.0f      
         z: MUL_e       R2.z,  R24.z,  R18.w      VEC_102 
         w: MUL_e       R31.w,  R8.x,  (0x42800000, 64.0f).x      
         t: MUL_e       R8.z,  R2.w,  (0x38000100, 0.00003051850945f).y      
     74  x: ADD         R11.x,  R47.z, -R15.z      
         y: MUL_e       R47.y,  R11.x,  (0x42800000, 64.0f).x      
         z: MUL_e       R41.z,  R6.y,  (0x42800000, 64.0f).x      
         w: ADD         R4.w,  R15.z,  KC1[6].z      VEC_120 
         t: U_TO_F      R8.x,  R4.w      
     75  x: ADD         R1.x,  R22.x, -R11.w      
         y: ADD         R6.y,  R16.z,  KC1[7].y      
         z: AND_INT     R15.z,  R14.y,  (0x00007FFF, 4.591634678e-41f).x      
         w: ADD         R11.w,  R22.y, -R16.z      VEC_120 
         t: CNDE        R14.z,  R14.x,  R14.x,  R19.w      
     76  x: SETGT       T2.x,  |R15.y|,  (0x42480000, 50.0f).x      
         y: MUL_e       R14.y,  R16.w,  R22.w      
         z: SETGT       T3.z,  |T2.z|,  (0x42480000, 50.0f).x      
         w: FRACT       T2.w,  R15.y      
         t: FRACT       T0.x,  T2.z      
     77  x: MUL_e       R14.x,  R17.w,  R26.y      
         y: MUL_e       R9.y,  R20.x,  R20.x      
         z: MUL_e       R16.z,  R19.y,  R23.z      
         w: MUL         R2.w,  R26.w,  (0x3E22F983, 0.1591549367f).x      VEC_201 
         t: MUL_e       R38.y,  KC0[0].z,  R12.w      
     78  x: MUL         R3.x,  R4.z,  (0x3E22F983, 0.1591549367f).x      
         y: MUL_e       R8.y,  R27.y,  R27.y      
         z: MIN_DX10    R1.z,  |R31.x|,  1.0f      
         w: MUL_e       R5.w,  R20.w,  R20.w      
         t: MUL_e       R27.x,  R24.z,  R7.y      
     79  x: MIN_DX10    R2.x,  |R46.y|,  1.0f      
         y: MIN_DX10    R2.y,  |R39.z|,  1.0f      
         z: MUL_e       R40.z,  T6.x,  (0x42800000, 64.0f).x      
         w: MUL_e       R1.w,  T7.x,  (0x38000100, 0.00003051850945f).y      VEC_120 
         t: ADD         R12.w,  R47.z, -T3.w      
     80  x: ADD         R10.x,  T3.w,  KC1[6].z      
         y: MUL_e       R48.y,  T4.z,  (0x42800000, 64.0f).x      
         z: ADD         R13.z,  R22.y, -T2.y      
         w: ADD         R13.w,  R22.x, -R13.w      
         t: U_TO_F      R12.z,  T4.w      
     81  x: ADD         R7.x,  T2.y,  KC1[7].y      VEC_120 
         y: AND_INT     R15.y,  T1.z,  (0x00007FFF, 4.591634678e-41f).x      
         z: CNDE_INT    R9.z,  T0.w,  R7.x,  T1.w      
         w: CNDE_INT    R9.w,  T2.x,  R15.y,  T2.w      VEC_201 
         t: CNDE_INT    R3.w,  T3.z,  T2.z,  T0.x      
     82  x: MULADD_e    R15.x,  R9.x,  (0x40400000, 3.0f).x, -T5.z      
         y: CNDE_INT    R16.y,  T5.w,  R14.w,  T0.z      VEC_021 
         z: MULADD_e    R10.z,  R17.z,  (0x40400000, 3.0f).x, -T6.w      VEC_021 
         w: MULADD_e    R14.w,  R23.y,  R15.x,  T1.x      
         t: COS         R8.w,  R16.y      
     83  x: MULADD_e    R9.x,  R22.z,  R17.y,  R10.w      
         y: MULADD_e    R17.y,  R24.y,  R18.z,  T4.x      VEC_021 
         z: MULADD_e    R17.z,  R15.w,  (0x40400000, 3.0f).x, -T7.w      VEC_021 
         w: MULADD_e    R10.w,  R18.y,  (0x40400000, 3.0f).x, -T5.x      VEC_210 
         t: SIN         R18.z,  T3.x      
04 ALU: ADDR(531) CNT(121) KCACHE0(CB1:0-15) KCACHE1(CB0:0-15) 
     84  x: MULADD_e    R14.x,  R16.x,  (0x40400000, 3.0f).x, -R13.y      
         y: MULADD_e    R13.y,  R19.z,  (0x40400000, 3.0f).x, -R7.z      VEC_021 
         z: MULADD_e    T6.z,  R22.w,  R16.w,  R14.y      VEC_021 
         w: MULADD_e    T0.w,  R26.y,  R17.w,  R14.x      
         t: COS         T3.w,  R0.z      
     85  x: ADD*4       T4.x, -R12.y,  1.0f      
         y: CNDE        T0.y,  R20.y,  R20.y,  R0.z      VEC_102 
         z: ADD         T1.z, -R6.x,  1.0f      
         w: MULADD_e    T1.w,  R23.z,  R19.y,  R16.z      VEC_021 
         t: CNDE        T4.w,  R17.x,  R17.x,  R12.x      VEC_102 
     86  x: SETGT       T3.x,  |R3.y|,  (0x42480000, 50.0f).x      
         y: SETGT       T3.y,  |R0.w|,  (0x42480000, 50.0f).x      
         z: MUL_e       T3.z,  R21.w,  R11.y      VEC_120 
         w: FRACT       T7.w,  R3.y      
         t: FRACT       T4.y,  R0.w      
     87  x: MUL         R17.x,  R5.y,  (0x3E22F983, 0.1591549367f).x      
         y: MUL_e       R19.y,  R13.x,  R13.x      
         z: MUL_e       T2.z,  R19.x,  R3.z      VEC_120 
         w: MUL_e       T6.w,  R25.y,  R6.z      VEC_102 
         t: MUL_e       R40.y,  KC0[0].z,  R18.w      
     88  x: MUL         R16.x,  R2.z,  (0x3E22F983, 0.1591549367f).x      
         y: MUL_e       R20.y,  R4.x,  R4.x      
         z: MUL_e       R0.z,  R4.y,  R4.y      
         w: MIN_DX10    R18.w,  |R31.w|,  1.0f      
         t: MUL_e       R41.y,  R24.z,  R8.z      
     89  x: MUL_e       R5.x,  R8.x,  (0x38000100, 0.00003051850945f).x      
         y: MIN_DX10    R14.y,  |R41.z|,  1.0f      
         z: MIN_DX10    R16.z,  |R47.y|,  1.0f      
         w: MUL_e       R32.w,  R11.x,  (0x42800000, 64.0f).y      VEC_120 
         t: ADD         R8.x,  R47.z, -R4.w      
     90  x: ADD         R1.x,  R4.w,  KC1[7].z      
         y: MUL_e       R50.y,  R1.x,  (0x42800000, 64.0f).x      
         z: MUL_e       R42.z,  R11.w,  (0x42800000, 64.0f).x      VEC_120 
         w: ADD         R11.w,  R22.x, -R21.y      VEC_120 
         t: U_TO_F      R4.w,  R15.z      
     91  x: ADD*4       T1.x, -R6.w,  1.0f      
         y: ADD         R6.y,  R22.y, -R6.y      
         z: ADD         T4.z, -R11.z,  1.0f      
         w: CNDE        R6.w,  R18.x,  R18.x,  R1.y      
         t: COS         R21.y,  R19.w      
     92  x: CNDE        R11.x,  R20.z,  R20.z,  R10.y      
         y: FRACT       T2.y,  R2.w      
         z: SETGT       T0.z,  |R2.w|,  (0x42480000, 50.0f).x      
         w: MUL_e       T2.w,  R20.x,  R9.y      
         t: SIN         R18.x,  R14.z      
     93  x: MUL_e       R24.x,  R1.z,  R1.z      
         y: MUL         R1.y,  R38.y,  (0x3E22F983, 0.1591549367f).x      
         z: FRACT       T5.z,  R3.x      
         w: SETGT       T5.w,  |R3.x|,  (0x42480000, 50.0f).y      
         t: COS         R14.z,  R1.y      
     94  x: MUL_e       T5.x,  R20.w,  R5.w      
         y: MUL         R7.y,  R27.x,  (0x3E22F983, 0.1591549367f).x      
         z: MUL_e       T7.z,  R27.y,  R8.y      
         w: MUL_e       R7.w,  KC0[0].z,  R7.y      VEC_021 
         t: MIN_DX10    R25.x,  |R40.z|,  1.0f      
     95  x: MUL_e       R30.x,  R24.z,  R1.w      
         y: MUL_e       R49.y,  R12.w,  (0x42800000, 64.0f).x      
         z: MUL_e       R7.z,  R2.y,  R2.y      
         w: MUL_e       R19.w,  R2.x,  R2.x      
         t: MIN_DX10    R26.x,  |R32.x|,  1.0f      
     96  x: ADD         R6.x,  R47.z, -R10.x      
         y: MUL_e       R12.y,  R12.z,  (0x38000100, 0.00003051850945f).x      VEC_120 
         z: MUL_e       R43.z,  R13.w,  (0x42800000, 64.0f).y      
         w: MIN_DX10    R16.w,  |R48.y|,  1.0f      
         t: MUL_e       R52.y,  R13.z,  (0x42800000, 64.0f).y      
     97  x: ADD         R10.x,  R22.x, -R21.z      
         y: CNDE_INT    R15.y,  T0.z,  R2.w,  T2.y      VEC_021 
         z: ADD         R21.z,  R22.y, -R7.x      
         w: ADD         R2.w,  R10.x,  KC1[7].z      VEC_201 
         t: U_TO_F      R7.x,  R15.y      
     98  x: CNDE_INT    R3.x,  T3.y,  R0.w,  T4.y      
         y: CNDE_INT    R22.y,  T5.w,  R3.x,  T5.z      VEC_021 
         z: CNDE_INT    R13.z,  T3.x,  R3.y,  T7.w      
         w: MUL_e       R0.w,  T4.x,  T1.z      VEC_102 
         t: COS         R12.z,  R9.z      
     99  x: MULADD_e    R19.x,  R22.z,  (0x40400000, 3.0f).x, -R9.x      VEC_102 
         y: MULADD_e    R24.y,  R23.y,  (0x40400000, 3.0f).x, -R14.w      VEC_120 
         z: MULADD_e    R22.z,  R24.y,  (0x40400000, 3.0f).x, -R17.y      
         w: MULADD_e    R14.w,  R3.z,  R19.x,  T2.z      
         t: MULADD_e    R17.y,  R5.w,  R20.w,  T5.x      
    100  x: MUL_e       R15.x,  T1.x,  T4.z      
         y: MULADD_e    R25.y,  R11.y,  R21.w,  T3.z      
         z: MULADD_e    R20.z,  R26.y,  (0x40400000, 3.0f).x, -T0.w      VEC_210 
         w: MULADD_e    R20.w,  R6.z,  R25.y,  T6.w      
         t: ADD         R21.w, -R15.x,  1.0f      
    101  x: MULADD_e    R20.x,  R9.y,  R20.x,  T2.w      
         y: MULADD_e    R26.y,  R8.y,  R27.y,  T7.z      VEC_120 
         z: MULADD_e    R11.z,  R23.z,  (0x40400000, 3.0f).x, -T1.w      VEC_120 
         w: MULADD_e    R12.w,  R22.w,  (0x40400000, 3.0f).x, -T6.z      VEC_102 
         t: COS         R23.z,  R12.x      
    102  x: MUL_e       R21.x,  R8.w,  T3.w      
         y: MUL_e       R27.y,  R18.z,  T3.w      
         z: ADD*4       R18.z, -R17.z,  1.0f      VEC_120 
         w: CNDE        R13.w,  R21.x,  R21.x,  R9.z      
         t: SIN         R9.z,  T0.y      
    103  x: CNDE        R9.x,  R28.y,  R28.y,  R16.y      
         y: ADD         R28.y, -R10.z,  1.0f      
         z: FRACT       R10.z,  R17.x      
         w: SETGT       R8.w,  |R17.x|,  (0x42480000, 50.0f).x      
         t: SIN         R23.y,  T4.w      
    104  x: SETGT       R12.x,  |R16.x|,  (0x42480000, 50.0f).x      
         y: MUL         R3.y,  R40.y,  (0x3E22F983, 0.1591549367f).y      
         z: FRACT       R17.z,  R16.x      
         w: MUL_e       R22.w,  R13.x,  R19.y      VEC_120 
         t: MUL_e       R22.x,  R18.w,  R18.w      
05 ALU: ADDR(652) CNT(124) KCACHE0(CB1:0-15) 
    105  x: MUL_e       T0.x,  R4.x,  R20.y      
         y: MUL_e       R44.y,  KC0[0].z,  R8.z      
         z: MUL         R15.z,  R41.y,  (0x3E22F983, 0.1591549367f).x      
         w: MUL_e       T1.w,  R4.y,  R0.z      VEC_201 
         t: MIN_DX10    R29.x,  |R32.w|,  1.0f      
    106  x: MUL_e       R8.x,  R24.z,  R5.x      
         y: MUL_e       R51.y,  R8.x,  (0x42800000, 64.0f).x      
         z: MUL_e       R32.z,  R14.y,  R14.y      
         w: MUL_e       R17.w,  R16.z,  R16.z      VEC_120 
         t: MIN_DX10    R18.y,  |R50.y|,  1.0f      
    107  x: MIN_DX10    R1.x,  |R42.z|,  1.0f      
         y: ADD         T2.y,  R47.z, -R1.x      VEC_120 
         z: MUL_e       R5.z,  R4.w,  (0x38000100, 0.00003051850945f).x      
         w: MUL_e       R34.w,  R11.w,  (0x42800000, 64.0f).y      VEC_120 
         t: MUL_e       R45.z,  R6.y,  (0x42800000, 64.0f).y      
    108  x: MUL_e       T1.x,  R21.y,  R14.z      
         y: MUL_e       T3.y,  R18.x,  R14.z      
         z: ADD         T3.z, -R13.y,  1.0f      VEC_120 
         w: CNDE        R6.w,  R29.y,  R29.y,  R9.w      VEC_201 
         t: SIN         T6.z,  R6.w      
    109  x: ADD         T3.x, -R14.x,  1.0f      
         y: SETGT       T0.y,  |R1.y|,  (0x42480000, 50.0f).x      
         z: CNDE        R8.z,  R30.y,  R30.y,  R3.w      VEC_120 
         w: ADD*4       T3.w, -R10.w,  1.0f      VEC_120 
         t: COS         R10.w,  R10.y      
    110  x: SETGT       T5.x,  |R7.y|,  (0x42480000, 50.0f).x      
         y: MUL_e       T4.y,  R1.z,  R24.x      
         z: FRACT       T7.z,  R7.y      
         w: FRACT       T4.w,  R1.y      VEC_120 
         t: SIN         R14.z,  R11.x      
    111  x: MUL_e       T4.x,  R2.x,  R19.w      
         y: MUL_e       R10.y,  R25.x,  R25.x      VEC_120 
         z: MUL_e       T4.z,  R2.y,  R7.z      
         w: MUL         R4.w,  R7.w,  (0x3E22F983, 0.1591549367f).x      
         t: COS         R9.w,  R9.w      
    112  x: MUL         R11.x,  R30.x,  (0x3E22F983, 0.1591549367f).x      
         y: MUL_e       R13.y,  KC0[0].z,  R1.w      VEC_102 
         z: MIN_DX10    R33.z,  |R49.y|,  1.0f      
         w: MUL_e       R11.w,  R26.x,  R26.x      VEC_120 
         t: MUL_e       R14.x,  R16.w,  R16.w      
    113  x: MUL_e       R6.x,  R24.z,  R12.y      
         y: MIN_DX10    R30.y,  |R43.z|,  1.0f      VEC_120 
         z: MUL_e       R44.z,  R6.x,  (0x42800000, 64.0f).x      
         w: MIN_DX10    R15.w,  |R52.y|,  1.0f      
         t: MUL_e       R1.w,  R7.x,  (0x38000100, 0.00003051850945f).y      
    114  x: MUL_e       R17.x,  R10.x,  (0x42800000, 64.0f).x      
         y: MUL_e       R1.y,  R21.z,  (0x42800000, 64.0f).x      VEC_120 
         z: CNDE_INT    R21.z,  R8.w,  R17.x,  R10.z      VEC_210 
         w: ADD         R2.w,  R47.z, -R2.w      VEC_210 
         t: CNDE_INT    R27.z,  T0.y,  R1.y,  T4.w      
    115  x: DOT4_e      ____,  R21.x,  R31.y      
         y: DOT4_e      ____,  R27.y,  R23.w      
         z: DOT4_e      R9.z,  R9.z,  R24.w      VEC_120 
         w: DOT4_e      ____,  (0x80000000, -0.0f).x,  0.0f      
         t: CNDE_INT    R27.y,  R12.x,  R16.x,  R17.z      
    116  x: DOT4_e      ____,  T1.x,  R32.y      
         y: DOT4_e      ____,  T3.y,  R25.w      VEC_201 
         z: DOT4_e      ____,  T6.z,  R33.y      VEC_102 
         w: DOT4_e      R0.w,  (0x80000000, -0.0f).x,  0.0f      
         t: MUL_e       R17.z,  R0.w,  R21.w      
    117  x: MUL_e       R13.x,  R18.z,  R28.y      
         y: MULADD_e    R7.y,  R19.y,  R13.x,  R22.w      VEC_021 
         z: MULADD_e    R10.z,  R3.z,  (0x40400000, 3.0f).x, -R14.w      VEC_210 
         w: CNDE_INT    R8.w,  T5.x,  R7.y,  T7.z      VEC_021 
         t: MUL_e       R3.z,  T3.w,  T3.x      
    118  x: MULADD_e    R2.x,  R11.y,  (0x40400000, 3.0f).x, -R25.y      
         y: MULADD_e    R4.y,  R0.z,  R4.y,  T1.w      VEC_210 
         z: MULADD_e    R19.z,  R6.z,  (0x40400000, 3.0f).x, -R20.w      VEC_021 
         w: MUL_e       R14.w,  R15.x,  T3.z      VEC_210 
         t: MULADD_e    R22.w,  R19.w,  R2.x,  T4.x      
    119  x: MULADD_e    R15.x,  R5.w,  (0x40400000, 3.0f).x, -R17.y      
         y: MULADD_e    R9.y,  R9.y,  (0x40400000, 3.0f).x, -R20.x      VEC_021 
         z: MULADD_e    R6.z,  R20.y,  R4.x,  T0.x      VEC_120 
         w: MUL_e       R13.w,  R23.z,  R12.z      
         t: SIN         R23.z,  R13.w      
    120  x: MULADD_e    R20.x,  R8.y,  (0x40400000, 3.0f).x, -R26.y      
         y: MULADD_e    R8.y,  R24.x,  R1.z,  T4.y      VEC_021 
         z: ADD*4       R1.z, -R19.x,  1.0f      VEC_120 
         w: ADD         R20.w, -R22.z,  1.0f      
         t: SIN         R18.z,  R9.x      
    121  x: ADD         R4.x, -R24.y,  1.0f      
         y: MUL_e       R2.y,  R23.y,  R12.z      VEC_120 
         z: MUL_e       R22.z,  R18.w,  R22.x      VEC_102 
         w: MULADD_e    R25.w,  R7.z,  R2.y,  T4.z      VEC_021 
         t: MUL_e       R23.y,  R29.x,  R29.x      
    122  x: FRACT       R19.x,  R15.z      
         y: CNDE        R16.y,  R34.y,  R34.y,  R13.z      
         z: SETGT       R12.z,  |R15.z|,  (0x42480000, 50.0f).x      
         w: CNDE        R24.w,  R35.y,  R35.y,  R3.x      VEC_120 
         t: COS         R21.w,  R16.y      
    123  x: SETGT       R9.x,  |R3.y|,  (0x42480000, 50.0f).x      
         y: MUL         R34.y,  R44.y,  (0x3E22F983, 0.1591549367f).y      VEC_120 
         z: MUL_e       R13.z,  R16.z,  R17.w      
         w: FRACT       R5.w,  R3.y      
         t: COS         R23.w,  R13.z      
    124  x: MUL_e       R5.x,  R14.y,  R32.z      
         y: MUL_e       R17.y,  KC0[0].z,  R5.x      
         z: MUL         R31.z,  R8.x,  (0x3E22F983, 0.1591549367f).x      
         w: MIN_DX10    R30.w,  |R51.y|,  1.0f      VEC_120 
         t: MUL_e       R36.z,  R18.y,  R18.y      VEC_102 
    125  x: MUL_e       R21.x,  R24.z,  R5.z      
         y: MUL_e       R35.y,  R1.x,  R1.x      
         z: MIN_DX10    R37.z,  |R34.w|,  1.0f      
         w: MUL_e       R33.w,  T2.y,  (0x42800000, 64.0f).x      
         t: MIN_DX10    R24.y,  |R45.z|,  1.0f      
06 ALU: ADDR(776) CNT(126) KCACHE0(CB1:0-15) 
    126  x: MUL_e       T0.x,  R10.w,  R9.w      
         y: MUL_e       T4.y,  R14.z,  R9.w      
         z: ADD*4       T6.z, -R20.z,  1.0f      VEC_120 
         w: ADD         T3.w, -R11.z,  1.0f      VEC_201 
         t: SIN         T3.z,  R6.w      
    127  x: CNDE        T1.x,  R4.z,  R4.z,  R22.y      
         y: MUL_e       T3.y,  R25.x,  R10.y      
         z: ADD         T7.z, -R12.w,  1.0f      
         w: CNDE        T4.w,  R26.w,  R26.w,  R15.y      VEC_120 
         t: COS         T0.y,  R3.w      
    128  x: FRACT       T4.x,  R11.x      
         y: FRACT       T2.y,  R4.w      
         z: SETGT       T4.z,  |R4.w|,  (0x42480000, 50.0f).x      
         w: SETGT       T1.w,  |R11.x|,  (0x42480000, 50.0f).x      
         t: SIN         T5.x,  R8.z      
    129  x: MUL_e       T3.x,  R16.w,  R14.x      
         y: MUL         R15.y,  R13.y,  (0x3E22F983, 0.1591549367f).x      
         z: MUL_e       T2.z,  R26.x,  R11.w      
         w: MUL_e       R3.w,  R33.z,  R33.z      
         t: COS         T1.z,  R15.y      
    130  x: MIN_DX10    R12.x,  |R44.z|,  1.0f      
         y: MUL_e       R12.y,  R30.y,  R30.y      
         z: MUL         R8.z,  R6.x,  (0x3E22F983, 0.1591549367f).x      
         w: MUL_e       R12.w,  KC0[0].z,  R12.y      
         t: MUL_e       R4.z,  R15.w,  R15.w      
    131  x: MUL_e       R7.x,  R2.w,  (0x42800000, 64.0f).x      VEC_120 
         y: MIN_DX10    R3.y,  |R17.x|,  1.0f      
         z: MUL_e       R11.z,  R24.z,  R1.w      VEC_021 
         w: MIN_DX10    R2.w,  |R1.y|,  1.0f      
         t: CNDE_INT    R24.z,  R9.x,  R3.y,  R5.w      
    132  x: DOT4_e      ____,  R13.w,  R36.y      
         y: DOT4_e      ____,  R2.y,  R25.z      VEC_201 
         z: DOT4_e      ____,  R23.z,  R27.w      VEC_120 
         w: DOT4_e      T2.w,  (0x80000000, -0.0f).x,  0.0f      
         t: CNDE_INT    R9.x,  T4.z,  R4.w,  T2.y      
    133  x: DOT4_e      ____,  T0.x,  R26.z      
         y: DOT4_e      R37.y,  T4.y,  R37.y      
         z: DOT4_e      ____,  T3.z,  R23.x      
         w: DOT4_e      ____,  (0x80000000, -0.0f).x,  0.0f      
         t: MUL_e       R20.w,  R1.z,  R20.w      
    134  x: CNDE_INT    R23.x,  R12.z,  R15.z,  R19.x      VEC_120 
         y: CNDE_INT    R19.y,  T1.w,  R11.x,  T4.x      
         z: MULADD_e    R12.z,  R19.y,  (0x40400000, 3.0f).x, -R7.y      
         w: MULADD_e    R0.w,  R0.z,  (0x40400000, 3.0f).x, -R4.y      VEC_021 
         t: MULADD_e    R0.z,  R0.w,  R14.w,  0.0f      
    135  x: MULADD_e    R11.x,  R20.y,  (0x40400000, 3.0f).x, -R6.z      
         y: MULADD_e    T4.y,  R9.z,  R17.z,  0.0f      
         z: MULADD_e    R6.z,  R24.x,  (0x40400000, 3.0f).x, -R8.y      
         w: MUL_e       T1.w,  R13.x,  R4.x      VEC_120 
         t: MULADD_e    R9.z,  R19.w,  (0x40400000, 3.0f).x, -R22.w      VEC_021 
    136  x: ADD         R2.x, -R2.x,  1.0f      
         y: MULADD_e    R8.y,  R22.x,  R18.w,  R22.z      VEC_120 
         z: MULADD_e    R16.z,  R10.y,  R25.x,  T3.y      VEC_021 
         w: MULADD_e    R18.w,  R17.w,  R16.z,  R13.z      
         t: SIN         R13.z,  R16.y      
    137  x: MULADD_e    R13.x,  R14.x,  R16.w,  T3.x      
         y: MUL_e       R14.y,  R3.z,  T7.z      
         z: MUL_e       R3.z,  R21.w,  R23.w      VEC_021 
         w: MULADD_e    R21.w,  R32.z,  R14.y,  R5.x      VEC_201 
         t: SETGT       R16.w,  |R34.y|,  (0x42480000, 50.0f).x      
    138  x: MUL_e       R4.x,  T6.z,  T3.w      
         y: MULADD_e    R4.y,  R11.w,  R26.x,  T2.z      
         z: MULADD_e    R17.z,  R7.z,  (0x40400000, 3.0f).x, -R25.w      VEC_102 
         w: FRACT       R25.w,  R34.y      
         t: COS         R7.z,  R3.x      
    139  x: ADD*4       R26.x, -R10.z,  1.0f      
         y: MUL_e       R5.y,  R18.z,  R23.w      VEC_102 
         z: MUL_e       R10.z,  R29.x,  R23.y      
         w: CNDE        R24.w,  R5.y,  R5.y,  R21.z      
         t: SIN         R7.y,  R24.w      
    140  x: CNDE        R25.x,  R2.z,  R2.z,  R27.y      
         y: ADD         R16.y, -R19.z,  1.0f      VEC_120 
         z: MUL         R1.z,  R17.y,  (0x3E22F983, 0.1591549367f).x      
         w: MUL_e       R27.w,  R30.w,  R30.w      
         t: COS         R21.z,  R21.z      
    141  x: MUL_e       R5.x,  R18.y,  R36.z      VEC_021 
         y: MUL_e       R20.y,  R1.x,  R35.y      
         z: FRACT       R2.z,  R31.z      
         w: SETGT       R23.w,  |R31.z|,  (0x42480000, 50.0f).x      
         t: MUL_e       R10.x,  KC0[0].z,  R5.z      
    142  x: MIN_DX10    R16.x,  |R33.w|,  1.0f      
         y: MUL_e       R2.y,  R37.z,  R37.z      
         z: MUL_e       R23.z,  R24.y,  R24.y      
         w: MUL         R13.w,  R21.x,  (0x3E22F983, 0.1591549367f).x      
         t: MUL_e       R3.x,  T0.y,  T1.z      
    143  x: ADD         R20.x, -R9.y,  1.0f      
         y: MUL_e       R9.y,  T5.x,  T1.z      
         z: CNDE        R22.z,  R38.y,  R38.y,  R27.z      VEC_120 
         w: ADD*4       R19.w, -R20.x,  1.0f      VEC_120 
         t: SIN         R5.z,  T4.w      
    144  x: SETGT       R15.x,  |R15.y|,  (0x42480000, 50.0f).x      
         y: FRACT       R22.y,  R15.y      
         z: CNDE        R15.z,  R27.x,  R27.x,  R8.w      
         w: ADD         R22.w, -R15.x,  1.0f      VEC_120 
         t: COS         R14.w,  R22.y      
    145  x: SETGT       R27.x,  |R8.z|,  (0x42480000, 50.0f).x      
         y: MUL_e       R38.y,  R33.z,  R3.w      VEC_120 
         z: FRACT       R19.z,  R8.z      
         w: MUL         R4.w,  R12.w,  (0x3E22F983, 0.1591549367f).y      
         t: SIN         R18.z,  T1.x      
    146  x: MUL_e       R24.x,  R30.y,  R12.y      
         y: MUL_e       R26.y,  R12.x,  R12.x      
         z: MUL_e       R27.z,  R15.w,  R4.z      
         w: MUL_e       R26.w,  KC0[0].z,  R1.w      
         t: COS         R1.w,  R27.z      
    147  x: MUL         R19.x,  R11.z,  (0x3E22F983, 0.1591549367f).x      
         y: MUL_e       R36.y,  R3.y,  R3.y      VEC_120 
         z: MIN_DX10    R25.z,  |R7.x|,  1.0f      
         w: MUL_e       R5.w,  R2.w,  R2.w      
         t: MULADD_e    R26.z,  T2.w,  T1.w,  T4.y      
07 ALU: ADDR(902) CNT(126) 
    148  x: DOT4_e      ____,  R3.z,  R39.y      VEC_021 
         y: DOT4_e      ____,  R5.y,  R28.z      
         z: DOT4_e      T3.z,  R13.z,  R28.x      VEC_201 
         w: DOT4_e      ____,  (0x80000000, -0.0f).x,  0.0f      
         t: CNDE_INT    R5.y,  R16.w,  R34.y,  R25.w      
    149  x: DOT4_e      ____,  R3.x,  R29.z      
         y: DOT4_e      ____,  R9.y,  R28.w      VEC_201 
         z: DOT4_e      ____,  R5.z,  R30.z      VEC_021 
         w: DOT4_e      T2.w,  (0x80000000, -0.0f).x,  0.0f      
         t: CNDE_INT    R5.z,  R15.x,  R15.y,  R22.y      
    150  x: MULADD_e    T5.x,  R37.y,  R14.y,  R0.z      VEC_021 
         y: CNDE_INT    R14.y,  R23.w,  R31.z,  R2.z      VEC_120 
         z: MUL_e       T4.z,  R20.w,  R2.x      
         w: MUL_e       T0.w,  R26.x,  R16.y      
         t: SIN         T2.z,  R24.w      
    151  x: MULADD_e    T2.x,  R22.x,  (0x40400000, 3.0f).x, -R8.y      VEC_021 
         y: MULADD_e    T7.y,  R32.z,  (0x40400000, 3.0f).x, -R21.w      VEC_201 
         z: MULADD_e    R19.z,  R17.w,  (0x40400000, 3.0f).x, -R18.w      
         w: CNDE_INT    R17.w,  R27.x,  R8.z,  R19.z      VEC_201 
         t: MULADD_e    R8.z,  R35.y,  R1.x,  R20.y      
    152  x: MULADD_e    R5.x,  R36.z,  R18.y,  R5.x      VEC_021 
         y: MUL_e       R10.y,  R19.w,  R22.w      
         z: MULADD_e    R10.z,  R23.y,  R29.x,  R10.z      VEC_102 
         w: MULADD_e    R19.w,  R10.y,  (0x40400000, 3.0f).x, -R16.z      VEC_021 
         t: SIN         R16.z,  R25.x      
    153  x: MULADD_e    R4.x,  R11.w,  (0x40400000, 3.0f).x, -R4.y      VEC_021 
         y: MULADD_e    T5.y,  R3.w,  R33.z,  R38.y      VEC_120 
         z: MULADD_e    R27.z,  R4.z,  R15.w,  R27.z      VEC_021 
         w: MUL_e       T4.w,  R4.x,  R20.x      
         t: COS         R11.w,  R27.y      
    154  x: MULADD_e    R13.x,  R14.x,  (0x40400000, 3.0f).x, -R13.x      
         y: MUL_e       T4.y,  R7.z,  R21.z      VEC_102 
         z: MUL_e       T1.z,  R7.y,  R21.z      VEC_102 
         w: MULADD_e    T6.w,  R12.y,  R30.y,  R24.x      VEC_021 
         t: ADD         T4.x, -R12.z,  1.0f      
    155  x: ADD*4       T7.x, -R0.w,  1.0f      
         y: CNDE        T6.y,  R40.y,  R40.y,  R24.z      
         z: ADD         T0.z, -R11.x,  1.0f      
         w: CNDE        R0.w,  R41.y,  R41.y,  R23.x      VEC_102 
         t: COS         R15.w,  R24.z      
    156  x: SETGT       T1.x,  |R1.z|,  (0x42480000, 50.0f).x      
         y: SETGT       T3.y,  |R13.w|,  (0x42480000, 50.0f).x      VEC_201 
         z: MUL_e       R24.z,  R30.w,  R27.w      
         w: FRACT       T1.w,  R1.z      
         t: FRACT       T2.y,  R13.w      
    157  x: MUL         R14.x,  R10.x,  (0x3E22F983, 0.1591549367f).x      
         y: MUL_e       R30.y,  R16.x,  R16.x      VEC_120 
         z: MUL_e       R7.z,  R37.z,  R2.y      
         w: MUL_e       T7.w,  R24.y,  R23.z      
         t: MUL_e       T3.x,  R14.w,  R1.w      
    158  x: CNDE        R11.x,  R30.x,  R30.x,  R19.y      
         y: MUL_e       T0.y,  R18.z,  R1.w      
         z: ADD         R6.z, -R6.z,  1.0f      VEC_120 
         w: CNDE        R1.w,  R7.w,  R7.w,  R9.x      
         t: SIN         T6.z,  R22.z      
    159  x: ADD         T0.x, -R17.z,  1.0f      
         y: ADD*4       T1.y, -R9.z,  1.0f      VEC_120 
         z: SETGT       T7.z,  |R4.w|,  (0x42480000, 50.0f).x      
         w: FRACT       T3.w,  R4.w      
         t: COS         R40.y,  R8.w      
    160  x: SETGT       T6.x,  |R19.x|,  (0x42480000, 50.0f).x      
         y: MUL         R41.y,  R26.w,  (0x3E22F983, 0.1591549367f).y      
         z: FRACT       T5.z,  R19.x      
         w: MUL_e       T5.w,  R12.x,  R26.y      VEC_120 
         t: SIN         R30.x,  R15.z      
    161  x: MUL_e       T1.x,  R2.w,  R5.w      
         y: CNDE_INT    R7.y,  T1.x,  R1.z,  T1.w      
         z: MUL_e       R1.z,  R3.y,  R36.y      
         w: MUL_e       R8.w,  R25.z,  R25.z      
         t: COS         R15.z,  R9.x      
    162  x: DOT4_e      ____,  T4.y,  R42.y      
         y: DOT4_e      ____,  T1.z,  R34.z      
         z: DOT4_e      ____,  T2.z,  R29.w      VEC_201 
         w: DOT4_e      R29.w,  (0x80000000, -0.0f).x,  0.0f      
         t: MULADD_e    R34.z,  T2.w,  T4.w,  T5.x      
    163  x: DOT4_e      ____,  T3.x,  R43.y      
         y: DOT4_e      R42.y,  T0.y,  R35.z      
         z: DOT4_e      ____,  T6.z,  R31.x      
         w: DOT4_e      ____,  (0x80000000, -0.0f).x,  0.0f      
         t: CNDE_INT    R9.x,  T7.z,  R4.w,  T3.w      
    164  x: CNDE_INT    R24.x,  T3.y,  R13.w,  T2.y      
         y: MULADD_e    R43.y,  T3.z,  T4.z,  R26.z      
         z: MUL_e       R26.z,  T1.y,  T0.x      VEC_102 
         w: MUL_e       R13.w,  T0.w,  T4.x      
         t: ADD         R4.w, -T2.x,  1.0f      
    165  x: MULADD_e    R19.x,  R23.y,  (0x40400000, 3.0f).x, -R10.z      
         y: CNDE_INT    R12.y,  T6.x,  R19.x,  T5.z      VEC_021 
         z: MULADD_e    R10.z,  R3.w,  (0x40400000, 3.0f).x, -T5.y      
         w: MUL_e       R3.w,  T7.x,  T0.z      VEC_102 
         t: MULADD_e    R35.z,  R12.y,  (0x40400000, 3.0f).x, -T6.w      VEC_021 
    166  x: MULADD_e    R31.x,  R35.y,  (0x40400000, 3.0f).x, -R8.z      
         y: MULADD_e    R35.y,  R36.z,  (0x40400000, 3.0f).x, -R5.x      VEC_120 
         z: MULADD_e    R24.z,  R27.w,  R30.w,  R24.z      VEC_120 
         w: MULADD_e    R30.w,  R26.y,  R12.x,  T5.w      VEC_120 
         t: SIN         R8.z,  T6.y      
    167  x: MULADD_e    R12.x,  R23.z,  R24.y,  T7.w      VEC_021 
         y: MULADD_e    R24.y,  R5.w,  R2.w,  T1.x      VEC_021 
         z: ADD         R7.z, -T7.y,  1.0f      VEC_120 
         w: MULADD_e    R2.w,  R2.y,  R37.z,  R7.z      
         t: COS         R37.z,  R23.x      
    168  x: MUL_e       R23.x,  R11.w,  R15.w      
         y: MUL_e       R10.y,  R10.y,  R6.z      
         z: MULADD_e    R4.z,  R4.z,  (0x40400000, 3.0f).x, -R27.z      
         w: CNDE        R0.w,  R44.y,  R44.y,  R5.y      VEC_102 
         t: SIN         R44.y,  R0.w      
    169  x: MUL_e       R5.x,  R16.z,  R15.w      
         y: MULADD_e    R3.y,  R36.y,  R3.y,  R1.z      
         z: ADD*4       R1.z, -R19.z,  1.0f      VEC_120 
         w: SETGT       R15.w,  |R14.x|,  (0x42480000, 50.0f).x      
         t: COS         R16.z,  R5.y      
08 ALU: ADDR(1028) CNT(122) 
    170  x: CNDE        T7.x,  R8.x,  R8.x,  R14.y      
         y: MUL_e       T7.y,  R40.y,  R15.z      
         z: FRACT       T0.z,  R14.x      VEC_120 
         w: MUL_e       T7.w,  R16.x,  R30.y      VEC_210 
         t: SIN         T5.z,  R1.w      
    171  x: ADD         T6.x, -R19.w,  1.0f      
         y: MUL_e       T6.y,  R30.x,  R15.z      
         z: CNDE        T4.z,  R13.y,  R13.y,  R5.z      
         w: ADD*4       T5.w, -R4.x,  1.0f      VEC_120 
         t: COS         T0.w,  R19.y      
    172  x: SETGT       T1.x,  |R41.y|,  (0x42480000, 50.0f).x      
         y: FRACT       T5.y,  R41.y      
         z: CNDE        T6.z,  R6.x,  R6.x,  R17.w      
         w: ADD         T6.w, -R13.x,  1.0f      VEC_120 
         t: SIN         T3.z,  R11.x      
    173  x: MULADD_e    T0.x,  R42.y,  R10.y,  R34.z      VEC_021 
         y: MUL_e       T1.y,  R25.z,  R8.w      
         z: MULADD_e    T2.z,  R29.w,  R13.w,  R43.y      VEC_021 
         w: ADD         T4.w, -R19.x,  1.0f      
         t: COS         T3.w,  R5.z      
    174  x: DOT4_e      ____,  R23.x,  R45.y      
         y: DOT4_e      ____,  R5.x,  R38.z      VEC_210 
         z: DOT4_e      T7.z,  R8.z,  R31.w      VEC_201 
         w: DOT4_e      ____,  (0x80000000, -0.0f).x,  0.0f      
         t: CNDE_INT    T4.y,  R15.w,  R14.x,  T0.z      
    175  x: DOT4_e      ____,  T7.y,  R46.y      
         y: DOT4_e      ____,  T6.y,  R39.z      VEC_201 
         z: DOT4_e      ____,  T5.z,  R40.z      VEC_120 
         w: DOT4_e      T2.w,  (0x80000000, -0.0f).x,  0.0f      
         t: MUL_e       T1.z,  R3.w,  R4.w      
    176  x: MUL_e       T1.x,  R1.z,  R7.z      
         y: MULADD_e    T2.y,  R27.w,  (0x40400000, 3.0f).x, -R24.z      
         z: CNDE_INT    R1.z,  T1.x,  R41.y,  T5.y      
         w: MULADD_e    R0.w,  R2.y,  (0x40400000, 3.0f).x, -R2.w      VEC_021 
         t: SIN         T0.z,  R0.w      
    177  x: MULADD_e    T6.x,  R30.y,  R16.x,  T7.w      
         y: MUL_e       T5.y,  T5.w,  T6.w      
         z: MULADD_e    R23.z,  R23.z,  (0x40400000, 3.0f).x, -R12.x      
         w: MUL_e       T5.w,  R26.z,  T6.x      VEC_102 
         t: MULADD_e    T4.x,  R36.y,  (0x40400000, 3.0f).x, -R3.y      VEC_021 
    178  x: MULADD_e    T3.x,  R5.w,  (0x40400000, 3.0f).x, -R24.y      
         y: MULADD_e    T0.y,  R26.y,  (0x40400000, 3.0f).x, -R30.w      VEC_021 
         z: MULADD_e    R25.z,  R8.w,  R25.z,  T1.y      VEC_201 
         w: MUL_e       T6.w,  R37.z,  R16.z      VEC_120 
         t: SIN         R37.z,  T7.x      
    179  x: ADD*4       T7.x, -R35.y,  1.0f      
         y: MUL_e       T1.y,  R44.y,  R16.z      VEC_120 
         z: ADD         T5.z, -R31.x,  1.0f      
         w: CNDE        T1.w,  R21.x,  R21.x,  R24.x      VEC_102 
         t: COS         T7.w,  R14.y      
    180  x: MUL_e       T2.x,  T0.w,  T3.w      
         y: CNDE        T7.y,  R17.y,  R17.y,  R7.y      
         z: MUL_e       T3.z,  T3.z,  T3.w      
         w: ADD         T0.w, -R10.z,  1.0f      VEC_120 
         t: COS         T3.w,  R7.y      
    181  x: ADD         T0.x, -R4.z,  1.0f      
         y: ADD*4       T6.y, -R35.z,  1.0f      VEC_120 
         z: MULADD_e    R4.z,  T2.w,  T5.w,  T0.x      
         w: CNDE        T2.w,  R12.w,  R12.w,  R9.x      VEC_201 
         t: SIN         T4.z,  T4.z      
    182  x: CNDE        T6.x,  R11.z,  R11.z,  R12.y      
         y: MUL_e       R30.y,  T7.x,  T5.z      
         z: MULADD_e    R11.z,  R30.y,  (0x40400000, 3.0f).x, -T6.x      
         w: MUL_e       T4.w,  T1.x,  T4.w      VEC_102 
         t: COS         T3.y,  R17.w      
    183  x: DOT4_e      ____,  T6.w,  R47.y      
         y: DOT4_e      ____,  T1.y,  R41.z      
         z: DOT4_e      ____,  T0.z,  R32.w      
         w: DOT4_e      T6.w,  (0x80000000, -0.0f).x,  0.0f      
         t: SIN         T1.x,  T6.z      
    184  x: DOT4_e      ____,  T2.x,  R32.x      
         y: DOT4_e      T1.y,  T3.z,  R48.y      
         z: DOT4_e      ____,  T4.z,  R49.y      VEC_120 
         w: DOT4_e      ____,  (0x80000000, -0.0f).x,  0.0f      
         t: COS         T3.z,  R9.x      
    185  x: MUL_e       T0.x,  T5.y,  T0.w      
         y: MULADD_e    T5.y,  T7.z,  T1.z,  T2.z      
         z: MUL_e       T2.z,  T6.y,  T0.x      VEC_120 
         w: MUL_e       T0.w,  T7.w,  T3.w      VEC_021 
         t: SIN         T1.z,  T7.y      
    186  x: ADD         T2.x, -T2.y,  1.0f      
         y: MUL_e       T2.y,  R37.z,  T3.w      
         z: CNDE        R41.z,  R10.x,  R10.x,  T4.y      
         w: MULADD_e    R8.w,  R8.w,  (0x40400000, 3.0f).x, -R25.z      
         t: COS         R25.z,  R24.x      
    187  x: ADD         R10.x, -R23.z,  1.0f      
         y: ADD*4       R48.y, -R0.w,  1.0f      
         z: ADD         T7.z, -T0.y,  1.0f      
         w: MUL_e       T1.w,  T3.y,  T3.z      VEC_120 
         t: SIN         R49.y,  T1.w      
    188  x: CNDE        R9.x,  R26.w,  R26.w,  R1.z      
         y: MUL_e       T4.y,  T1.x,  T3.z      
         z: ADD         R23.z, -T3.x,  1.0f      VEC_120 
         w: ADD*4       R26.w, -T4.x,  1.0f      VEC_201 
         t: COS         R37.z,  T4.y      
    189  x: DOT4_e      ____,  T0.w,  R50.y      
         y: DOT4_e      ____,  T2.y,  R42.z      
         z: DOT4_e      R42.z,  T1.z,  R51.y      VEC_021 
         w: DOT4_e      ____,  (0x80000000, -0.0f).x,  0.0f      
         t: SIN         ____,  T2.w      
    190  x: DOT4_e      ____,  T1.w,  R43.z      
         y: DOT4_e      ____,  T4.y,  R52.y      
         z: DOT4_e      ____,  PS189,  R44.z      VEC_021 
         w: DOT4_e      R32.w,  (0x80000000, -0.0f).x,  0.0f      
         t: COS         R0.w,  R12.y      
    191  x: MULADD_e    R24.x,  T1.y,  T0.x,  R4.z      
         y: MUL_e       R30.y,  R30.y,  T2.x      VEC_102 
         z: MULADD_e    R43.z,  T6.w,  T4.w,  T5.y      
         w: MUL_e       R17.w,  T2.z,  T7.z      
         t: SIN         R4.z,  T6.x      
09 ALU: ADDR(1150) CNT(44) 
    192  x: MUL_e       T2.x,  R48.y,  R10.x      
         y: MUL_e       T5.y,  R49.y,  R37.z      VEC_102 
         z: MUL_e       ____,  R26.w,  R23.z      
         w: ADD         ____, -R8.w,  1.0f      VEC_120 
         t: COS         T4.w,  R1.z      
    193  x: MUL_e       T0.x,  R25.z,  R37.z      
         y: MUL_e       T1.y,  R0.w,  PS192      
         z: MULADD_e    T2.z,  R32.w,  R17.w,  R24.x      VEC_120 
         w: MUL_e       T6.w,  PV192.z,  PV192.w      
         t: SIN         ____,  R41.z      
    194  x: ADD         T6.x, -R11.z,  1.0f      
         y: MUL_e       ____,  R4.z,  T4.w      VEC_120 
         z: MUL_e       ____,  PS193,  R33.w      
         t: SIN         ____,  R9.x      
    195  x: DOT4_e      ____,  T1.y,  R17.x      
         y: DOT4_e      ____,  PV194.y,  R1.y      
         z: DOT4_e      ____,  PS194,  R7.x      VEC_021 
         w: DOT4_e      ____,  (0x80000000, -0.0f).x,  0.0f      
         t: MULADD_e    ____,  T5.y,  R45.z,  PV194.z      
    196  x: MUL_e       ____,  T2.x,  T6.x      
         y: MULADD_e    ____,  R42.z,  R30.y,  R43.z      
         z: MULADD_e    T2.z,  PV195.x,  T6.w,  T2.z      VEC_021 
         w: MULADD_e    ____,  T0.x,  R34.w,  PS195      VEC_201 
    197  z: MULADD_e    ____,  PV196.w,  PV196.x,  PV196.y      
    198  w: MULADD_e    T6.w,  T2.z,  0.5,  PV197.z      
    199  x: INTERP_XY   R30.x,  R0.y,  Param3.x      VEC_210 
         y: INTERP_XY   R30.y,  R0.x,  Param3.x      VEC_210 
         z: INTERP_XY   ____,  R0.y,  Param3.x      VEC_210 
         w: INTERP_XY   ____,  R0.x,  Param3.x      VEC_210 
    200  z: MULADD_e    ____,  T6.w,  (0x40400000, 3.0f).x,  R46.z      
    201  y: MULADD_e    T1.y,  R47.z,  (0x42F00000, 120.0f).x,  PV200.z      
    202  x: MUL         T0.x,  PV201.y,  (0x3E22F983, 0.1591549367f).x      
    203  z: FRACT       ____,  PV202.x      
         w: SETGT       ____,  |PV202.x|,  (0x42480000, 50.0f).x      
    204  y: CNDE_INT    ____,  PV203.w,  T0.x,  PV203.z      
    205  x: CNDE        ____,  T1.y,  T1.y,  PV204.y      
    206  t: SIN         ____,  PV205.x      
    207  w: MULADD_e    R34.w,  PS206,  (0x3F266666, 0.6499999762f).y,  (0x3EB33333, 0.349999994f).x      
10 TEX: ADDR(1264) CNT(2) VALID_PIX 
    208  SAMPLE R30.xyz_, R30.xy0x, t1, s0
    209  SAMPLE_LZ R47.xyz_, R34.ww0w, t0, s1
11 ALU: ADDR(1194) CNT(65) KCACHE0(CB1:0-15) 
    210  x: INTERP_XY   T1.x,  R0.y,  Param2.x      VEC_210 
         y: INTERP_XY   T1.y,  R0.x,  Param2.x      VEC_210 
         z: INTERP_XY   ____,  R0.y,  Param2.x      VEC_210 
         w: INTERP_XY   ____,  R0.x,  Param2.x      VEC_210 
    211  x: INTERP_ZW   ____,  R0.y,  Param2.x      VEC_210 
         y: INTERP_ZW   ____,  R0.x,  Param2.x      VEC_210 
         z: INTERP_ZW   T2.z,  R0.y,  Param2.x      VEC_210 
         w: INTERP_ZW   ____,  R0.x,  Param2.x      VEC_210 
    212  x: DOT4_e      ____,  T1.x,  T1.x      
         y: DOT4_e      ____,  T1.y,  T1.y      
         z: DOT4_e      ____,  PV211.z,  PV211.z      
         w: DOT4_e      ____,  (0x80000000, -0.0f).x,  0.0f      
         t: MULADD_e    T2.x,  R30.x,  (0x40000000, 2.0f).y, -1.0f      
    213  x: ADD         T0.x, -PV212.x,  1.0f      
         y: MULADD_e    T5.y,  R30.y,  (0x40000000, 2.0f).x, -1.0f      
         z: MULADD_e    T7.z,  R30.z,  (0x40000000, 2.0f).x, -1.0f      
         t: RSQ_e       T6.w,  PV212.x      
    214  x: INTERP_XY   T6.x,  R0.y,  Param1.x      VEC_210 
         y: INTERP_XY   T6.y,  R0.x,  Param1.x      VEC_210 
         z: INTERP_XY   ____,  R0.y,  Param1.x      VEC_210 
         w: INTERP_XY   ____,  R0.x,  Param1.x      VEC_210 
    215  x: INTERP_ZW   ____,  R0.y,  Param1.x      VEC_210 
         y: INTERP_ZW   ____,  R0.x,  Param1.x      VEC_210 
         z: INTERP_ZW   T1.z,  R0.y,  Param1.x      VEC_210 
         w: INTERP_ZW   ____,  R0.x,  Param1.x      VEC_210 
    216  x: MUL_e       T0.x,  T1.x,  T6.w      
         y: MUL_e       T1.y,  T1.y,  T6.w      
         z: MUL_e       T2.z,  T2.z,  T6.w      
         w: MAX_DX10    T4.w,  T0.x,  0.0f      VEC_120 
         t: MUL_e       ____,  PV215.z,  PV215.z      
    217  x: DOT4_e      ____,  PV216.x,  T2.x      
         y: DOT4_e      ____,  PV216.y,  T5.y      VEC_102 
         z: DOT4_e      ____,  PV216.z,  T7.z      
         w: DOT4_e      ____,  (0x80000000, -0.0f).x,  0.0f      
         t: MULADD_e    ____,  T6.y,  T6.y,  PS216      
    218  x: MOV*2       ____,  PV217.x      
         y: MOV         ____,  PV217.x      CLAMP 
         w: MULADD_e    ____,  T6.x,  T6.x,  PS217      
    219  x: MULADD_e    T0.x,  T0.x, -PV218.x,  T2.x      
         y: MULADD_e    T1.y,  T1.y, -PV218.x,  T5.y      
         z: MULADD_e    T2.z,  T2.z, -PV218.x,  T7.z      
         w: ADD         ____,  PV218.y,  (0x3ECCCCCD, 0.400000006f).x      
         t: RSQ_e       ____,  PV218.w      
    220  x: MUL_e       ____,  T6.x,  PS219      
         y: MUL_e       ____,  T6.y,  PS219      
         z: MUL_e       ____,  T1.z,  PS219      
         w: MIN_DX10    T6.w,  PV219.w,  1.0f      
    221  x: DOT4_e      ____,  PV220.x,  T0.x      CLAMP 
         y: DOT4_e      ____,  PV220.y,  T1.y      CLAMP 
         z: DOT4_e      ____,  PV220.z,  T2.z      CLAMP 
         w: DOT4_e      ____,  (0x80000000, -0.0f).x,  0.0f      CLAMP 
    222  t: LOG_e       ____,  PV221.x      
    223  w: MUL_e       ____,  PS222,  KC0[0].y      
    224  t: EXP_e       ____,  PV223.w      
    225  x: MULADD_e    ____,  T6.w,  R47.z,  PS224      
         y: MULADD_e    ____,  T6.w,  R47.y,  PS224      
         z: MULADD_e    ____,  T6.w,  R47.x,  PS224      
    226  x: MUL_e       R53.x,  T4.w,  PV225.z      
         y: MUL_e       R53.y,  T4.w,  PV225.y      
         z: MUL_e       R53.z,  T4.w,  PV225.x      
12 EXP_DONE: PIX0, R53
END_OF_PROGRAM

would suffer (that's HD5870, by the way). Doubly so, probably, as there's a hell of a lot of transcendentals. I can't remember the numbers, this is something like 91 or 94% utilisation. Shaders like that are exceptional though.

Your figure is arrived at with fairytale accounting. Much of what is eliminated is simply transplanted, like the LUTs, squarer, cuber, exponent processing (exp, log, div).
I specifically kept those things. Seems you totally forgot the int32 multiplier.

You're going to increase the longest path,
Which is?

Well now you're reducing throughput even more. With the current architecture, in two cycles you can do 1 or 2 transcendentals and 9 or 8 regular ops. With your modification, you can only do one transcendental (and maybe one regular op alongside the square/cube cycle).
Transcendental throughput isn't very important.

If there was a square/cube cycle there'd be 2 lanes available. I'm proposing there only need be a square in the prior cycle, which leaves 3 lanes available.

Whole SIMD staggering is actually isomorphic to the quasi-scalar architecture I was proposing. It has the same requirement of needing more active batches. Four cycles of stagger between x and y, y and z, and z and w, for example, would have 20 total cycles of latency before proceeding to the next instruction group, so you'd need 5 active wavefronts instead of 2.
The stagger is 1 cycle twixt lanes as far as I can tell.

I'm pretty sure that ATI isn't doing this, though. The restrictions on dependent math are a big hint, IMO.
Which restrictions? It can do ADDs, MULs and MADs.

Anyway, I'm not arguing heavily for this. Just curious as to what it would entail.

Jawed
 
I believe it's 4 clocks latency between issuing the read request vs. when the data is available. Is it possible you've issued some prior LDS_READ2s in your code?
This is all of the LDS operations:

Code:
         19  x: LDS_WRITE   ____,  R1.x,  R0.x      
             z: ADD_INT     T0.z,  R1.w,  1      
             t: ADD         R0.x,  R5.x,  1.0f      
         20  x: LDS_READ2_RET  QAB,  R1.w,  PV19.z      
         21  y: MOV         T0.y,  QB.pop      VEC_120 
             w: MOV         T0.w,  QA.pop

There's no clause break between the write and the reads because the work group size is equal to the hardware thread size.

Jawed
 
In the Siggraph Asia presentation you guys said this :
Low Latency Access per SIMD Engine
• 0 latency direct reads (Conflict free or Broadcast)
• 1 VLIW instrucMon latency for LDS indirect Op

If that's accurate direct reads should be available to be used in the later channels on the same cycle (in which case I am 100% certain the lanes are staggered clockwise).

PS. hmm, on second thought ... stagger is irrelevant considering that for SAD and dependent multiply ZW come first and direct LDS loads don't have channel requirements, weird ... guess I should have hedged my bets and said 99% in retrospect.

PPS. unless the documentation is incomplete and the direct read has to be in channel W for the results to be used on the same cycle.
 
Last edited by a moderator:
We understand what you mean, but the 256 bit bus isn't bottlenecking it at the moment.
Bandwidth isn't major bottleneck for HD5870, but it's more bottlenecking factor than for HD4870. I also think, that with faster front-end RV870 would be bottlenecked by bandwidth... If the RV870's successor brings faster front-end, higher bandwidth will be needed to unveil this advantage...
 
Why is noone using some kind of tile based rendering on discret graphics. :?:

At least multisampling could be done entirely on chip on a tilecache and dont waste frame buffer bandwith and space. For cards with 60-70 GB/s it could be quite handy.
 
This is all of the LDS operations:

Code:
         19  x: LDS_WRITE   ____,  R1.x,  R0.x      
             z: ADD_INT     T0.z,  R1.w,  1      
             t: ADD         R0.x,  R5.x,  1.0f      
         20  x: LDS_READ2_RET  QAB,  R1.w,  PV19.z      
         21  y: MOV         T0.y,  QB.pop      VEC_120 
             w: MOV         T0.w,  QA.pop

There's no clause break between the write and the reads because the work group size is equal to the hardware thread size.
Ok thanks, I must be getting confused. I might have been looking at an OpenCL sample that had a default thread group size of 256.
 
The L2 to L1 cache bandwith is already 435GB/s on cypress and the aggregated L1 texture cache bandwith is 1 TB/s.(and these should be on 850 MHz clock)

Those theoretical flops are paralel on the 20 SIMDs and 1600 SP so the 1 byte/flop could be reached with just 20(L1 cache)x138 GB/s. :?:

texture cache can be disregarded from the calculations.

The whole point of my comment/response was that 1B/flop is a pipedream going forward and graphics will have to rely on caching and blocking in order for performance to continue to improve as the flops race ever faster.

As far as their current cache bandwidths go, the caches in general are too small to significantly cut the miss rates.
 
Why is noone using some kind of tile based rendering on discret graphics. :?:

At least multisampling could be done entirely on chip on a tilecache and dont waste frame buffer bandwith and space. For cards with 60-70 GB/s it could be quite handy.

IMG comes to mind.
 
Actual compiled ISA:

Code:
         20  x: LDS_READ2_RET  QAB,  R1.w,  PV19.z      
         21  y: MOV         T0.y,  QB.pop      VEC_120 
             w: MOV         T0.w,  QA.pop

The earlier snippet I posted has 1 cycle latency between enqueue and pop, same as this snippet.

So I'm not sure what you're saying about latency :???:

Jawed
The latency may be higher if there is a bank conflict, in this case it will just stall, if there is no bank conflict it will be ready to be used next cycle, like your code sugests.

As far as their current cache bandwidths go, the caches in general are too small to significantly cut the miss rates.
And their register file?
 
Does anyone know the average ALU clause length of a game like Crysis? I wonder how much sense it would make to simply say screw registers, time for a memory to memory architecture. Spilling register sets to cache sounds nice in theory, but if in practice you are pushing/popping all the time it starts to become a bit silly.
 
Does anyone know the average ALU clause length of a game like Crysis?
The longest possible clause length is 32 logical cycles, 256 physical cycles.

I wonder how much sense it would make to simply say screw registers, time for a memory to memory architecture. Spilling register sets to cache sounds nice in theory, but if in practice you are pushing/popping all the time it starts to become a bit silly.
Clause temporary registers (Evergreen provides up to 8 of them, prior GPUs only provided 4 - so it seems 4 was too few... I'm not sure why 8 is considered enough, to be honest) are sort of a half-way house. They're registers whose lifetime is highly constrained.

The key question is: can you make spilling registers the target of latency-hiding?

In ATI, clauses increase the latency of a work item. The more clauses, the more ALU instructions are required to hide the latency caused by those clauses. Even if there's zero texturing.

So register-spilling is like clauses: it's a latency-inducing event. Therefore it's hideable.

In my view register spill has always been a low priority in ATI, because ALU:TEX has dominated. But there's no reason not to rewrite this as ALU:(TEX+spill). In fact the right hand side also includes things like memory export, control flow, waterfalling (constants or LDS) etc.

Jawed
 
The latency may be higher if there is a bank conflict, in this case it will just stall, if there is no bank conflict it will be ready to be used next cycle, like your code sugests.
My understanding of Evergreen ISA is that the "MOV dst, QA" instruction is an unnecessary detour.

e.g. this should be possible:

Code:
         11  x: LDS_READ2_RET  QAB,  R10.x,  R10.y      
         12  x: ADD         R0.x,  R2.x,  QA
             y: ADD         R0.y,  R2.y,  QA      
             z: ADD         R0.z,  R2.z,  QA      
             w: ADD         R0.w,  R2.w,  QA
             t: ADD         R0.x,  R0.x,  QB      
         13  x: ADD         R0.y,  R0.y,  QB
             y: ADD         R0.z,  R0.z,  QB     
             z: ADD         R0.w,  R0.w,  QB.pop

I think the "pop" pops both A and B queues simultaneously.

This is pure speculation though as I've never seen any code produce a compilation like this.

Jawed
 
Each hardware thread can have 128KB of register file if it wants. That's 2KB per work item.

Jawed

And x86 can have 2^40 bytes of "register file" if it wants to. ;)

The issue is that with a realistic number of threads what is the actual number available.
 
The issue is that with a realistic number of threads what is the actual number available.
6 hardware threads, up to around 670 bytes, is an entirely sane allocation ;) i.e. 2 threads in control flow, 2 threads in ALU and 2 threads in TEX.

Jawed
 
Back
Top