Jawed
Legend
Haven't really looked at that before, it seems the two write addresses are linked by the specified offset, so it's not two general purpose writes.Why only 16 writes? I see a DS_INST_WRITE2 opcode in the ISA.
Jawed
Haven't really looked at that before, it seems the two write addresses are linked by the specified offset, so it's not two general purpose writes.Why only 16 writes? I see a DS_INST_WRITE2 opcode in the ISA.
The only observation I'm making is that in the current T this sequence of multiplies and adds is possible. The longest chain of instructions is 3 multiplies and add.But those are structured differently from the multiplies in the ALU units. As you mentioned to me, you can do two multiplies in series in the other ALUs, so the first multiplication starts very soon. In the T units, you have to get the LUT value and you need to do a partial square and cube before starting the multiply, the LUT coefficients don't need full IEEE to do those multiplies, and don't need another multiply after that (so they are further down the pipeline than in the other ALUs). Note how the other ALUs don't let you do an add after two serial muls, so that's not a viable path to accommodate the square/cube.
The structure is totally different. Look at Fig. 7 in the patent you linked to.
Don't forget in the existing T the pipeline's 8 cycles need to produce a normalised, floating-point, result. The add isn't the end of the story. The timing is super-tight. If you count 2 cycles per multiply then there's not enough time to do all of: pre-processing, LUTs, add and the conversion.It still needs a cycle. The squaring and cubing probably need two.
I never said it was.What makes you think the T-unit idling is responsible for the last 20-25%?
This is normal.Or that the other ALUs are idling while the T-unit is working?
I didn't say it wouldn't cost performance. I said the typical best-case utilisation of shaders is such that it will be unaffected.It absolutely will cost performance.
; -------- Disassembly --------------------
00 ALU: ADDR(32) CNT(124) KCACHE0(CB1:0-15) KCACHE1(CB0:0-15)
0 x: INTERP_XY R22.x, R0.y, Param0.x VEC_210
y: INTERP_XY R22.y, R0.x, Param0.x VEC_210
z: INTERP_XY ____, R0.y, Param0.x VEC_210
w: INTERP_XY ____, R0.x, Param0.x VEC_210
1 x: MUL_e ____, PV0.x, (0x43000000, 128.0f).x
y: MUL_e ____, PV0.x, (0x42800000, 64.0f).y
z: MUL_e ____, PV0.y, (0x42800000, 64.0f).y
w: MUL_e ____, PV0.y, (0x43000000, 128.0f).x
t: MOV*2 R24.z, KC0[0].z
2 x: FLOOR ____, PV1.z
y: FLOOR ____, PV1.y
z: FLOOR ____, PV1.x
w: FLOOR ____, PV1.w
t: MUL_e R46.z, R22.x, (0x42F00000, 120.0f).x
3 x: MULADD_e T1.x, PV2.x, (0x3C800000, 0.015625f).x, KC1[0].y
y: MULADD_e ____, PV2.y, (0x3C800000, 0.015625f).x, KC1[0].x
z: MULADD_e T2.z, PV2.w, (0x3C800000, 0.015625f).x, KC1[0].y
w: MULADD_e T1.w, PV2.z, (0x3C800000, 0.015625f).x, KC1[0].x
t: MOV R53.w, (0x3F800000, 1.0f).y
4 x: ADD T3.x, PV3.y, KC1[1].x
y: MUL_e T0.y, PV3.w, (0x49742400, 1000000.0f).x
z: MUL_e ____, PV3.y, (0x49742400, 1000000.0f).x
w: ADD R2.w, PV3.w, KC1[1].x
t: ADD T2.x, R22.x, -PV3.y
5 x: ADD R8.x, PV4.x, KC1[2].x
y: MUL_e T1.y, PV4.x, (0x49742400, 1000000.0f).x
z: MUL_e T0.z, PV4.w, (0x49742400, 1000000.0f).x
w: ADD R4.w, PV4.w, KC1[2].x
t: F_TO_I T2.y, PV4.z
6 x: ADD R11.x, PV5.x, KC1[3].x
y: MUL_e T0.y, PV5.w, (0x49742400, 1000000.0f).x
z: MUL_e T1.z, PV5.x, (0x49742400, 1000000.0f).x
w: ADD R6.w, PV5.w, KC1[3].x
t: F_TO_I T0.w, T0.y
7 x: ADD R12.x, PV6.x, KC1[4].x
y: MUL_e T3.y, PV6.x, (0x49742400, 1000000.0f).x
z: MUL_e T3.z, PV6.w, (0x49742400, 1000000.0f).x
w: ADD R7.w, PV6.w, KC1[4].x
t: MULLO_INT ____, T2.y, T2.y
8 x: ADD R13.x, PV7.x, KC1[5].x
y: MUL_e R2.y, PV7.w, (0x49742400, 1000000.0f).x
z: MUL_e R0.z, PV7.x, (0x49742400, 1000000.0f).x
w: ADD_INT ____, T2.y, PS7
t: F_TO_I T0.x, T1.y
9 x: MUL_e R7.x, PV8.x, (0x49742400, 1000000.0f).x
y: ADD R8.y, R7.w, KC1[5].x
z: ASHR ____, PV8.w, (0x00000010, 2.242077543e-44f).y
w: ADD R11.w, PV8.x, KC1[6].x
t: MULLO_INT ____, T0.w, T0.w
10 x: AND_INT R4.x, PV9.z, (0x00007FFF, 4.591634678e-41f).x
y: ADD_INT T1.y, T0.w, PS9
z: MUL_e R3.z, PV9.y, (0x49742400, 1000000.0f).y
w: ADD R13.w, PV9.y, KC1[6].x
t: F_TO_I T2.y, T0.z
11 x: INTERP_ZW ____, R0.y, Param0.x VEC_210
y: INTERP_ZW ____, R0.x, Param0.x VEC_210
z: INTERP_ZW R47.z, R0.y, Param0.x VEC_210
w: INTERP_ZW ____, R0.x, Param0.x VEC_210
12 x: MUL_e ____, PV11.z, (0x43000000, 128.0f).x
y: ASHR ____, T1.y, (0x00000010, 2.242077543e-44f).y
z: ADD T0.z, R22.y, -T1.x VEC_102
w: MUL_e ____, PV11.z, (0x42800000, 64.0f).z
t: MULLO_INT ____, T0.x, T0.x
13 x: FLOOR T0.x, PV12.x
y: AND_INT R4.y, PV12.y, (0x00007FFF, 4.591634678e-41f).x
z: ADD_INT ____, T0.x, PS12
w: FLOOR ____, PV12.w
t: F_TO_I T1.y, T1.z
14 x: ADD T1.x, T1.x, KC1[1].y
y: ASHR ____, PV13.z, (0x00000010, 2.242077543e-44f).x
z: MUL_e R4.z, R11.w, (0x49742400, 1000000.0f).y
w: MULADD_e T2.w, PV13.w, (0x3C800000, 0.015625f).z, KC1[0].z
t: MULLO_INT ____, T2.y, T2.y
15 x: AND_INT R9.x, PV14.y, (0x00007FFF, 4.591634678e-41f).x
y: ADD R21.y, R11.w, KC1[7].x
z: MULADD_e R1.z, T0.x, (0x3C800000, 0.015625f).y, KC1[0].z
w: ADD_INT ____, T2.y, PS14
t: F_TO_I T0.w, T0.y
16 x: ADD R2.x, T2.z, KC1[1].y
y: ADD R1.y, R22.y, -T2.z
z: ASHR ____, PV15.w, (0x00000010, 2.242077543e-44f).x
w: ADD R1.w, R22.x, -T1.w
t: MULLO_INT ____, T1.y, T1.y
17 x: MUL_e R10.x, R13.w, (0x49742400, 1000000.0f).x
y: AND_INT R6.y, PV16.z, (0x00007FFF, 4.591634678e-41f).y
z: ADD R21.z, R13.w, KC1[7].x
w: ADD_INT ____, T1.y, PS16
t: F_TO_I R1.x, T3.y
18 x: ADD R5.x, R47.z, -T2.w VEC_102
y: MUL_e R31.y, T2.x, (0x42800000, 64.0f).x
z: ASHR T0.z, PV17.w, (0x00000010, 2.242077543e-44f).y
w: MUL_e R23.w, T0.z, (0x42800000, 64.0f).x
t: MULLO_INT ____, T0.w, T0.w
19 x: ADD R6.x, R22.x, -T3.x
y: ADD_INT ____, T0.w, PS18
z: ADD R2.z, T2.w, KC1[1].z VEC_120
w: ADD R3.w, R22.y, -T1.x VEC_021
t: F_TO_I R3.y, T3.z
20 x: ASHR R3.x, PV19.y, (0x00000010, 2.242077543e-44f).x
y: ADD R5.y, T1.x, KC1[2].y
z: AND_INT R5.z, T0.z, (0x00007FFF, 4.591634678e-41f).y
w: MUL_e R5.w, R21.y, (0x49742400, 1000000.0f).z
t: MULLO_INT R0.w, R1.x, R1.x
01 ALU: ADDR(156) CNT(125) KCACHE0(CB0:0-15) KCACHE1(CB1:0-15)
21 x: ADD T1.x, R47.z, -R1.z
y: MUL_e R32.y, R1.w, (0x42800000, 64.0f).x
z: ADD_INT ____, R1.x, R0.w
w: MUL_e R25.w, R1.y, (0x42800000, 64.0f).x
t: F_TO_I T1.y, R0.z
22 x: ADD T3.x, R22.x, -R2.w
y: ASHR T3.y, PV21.z, (0x00000010, 2.242077543e-44f).x
z: ADD T0.z, R22.y, -R2.x
w: ADD T2.w, R1.z, KC0[1].z
t: MULLO_INT ____, R3.y, R3.y
23 x: AND_INT T5.x, R3.x, (0x00007FFF, 4.591634678e-41f).x
y: MUL_e T4.y, R21.z, (0x49742400, 1000000.0f).y
z: ADD T3.z, R2.x, KC0[2].y VEC_120
w: ADD_INT ____, R3.y, PS22
t: F_TO_I T0.w, R2.y
24 x: MIN_DX10 R3.x, |R23.w|, 1.0f
y: MIN_DX10 R1.y, |R31.y|, 1.0f
z: ASHR T2.z, PV23.w, (0x00000010, 2.242077543e-44f).x
w: MUL_e R24.w, R5.x, (0x42800000, 64.0f).y
t: U_TO_F ____, R4.x
25 x: ADD T0.x, R47.z, -R2.z
y: MUL_e R36.y, R6.x, (0x42800000, 64.0f).x
z: MUL_e R25.z, R3.w, (0x42800000, 64.0f).x
w: MUL_e T1.w, PS24, (0x38000100, 0.00003051850945f).y
t: MULLO_INT ____, T1.y, T1.y
26 x: ADD T4.x, R2.z, KC0[2].z
y: ADD T0.y, R22.x, -R8.x
z: MUL_e R13.z, R24.z, PV25.w VEC_120
w: ADD_INT ____, T1.y, PS25
t: F_TO_I T2.x, R7.x
27 x: ADD T1.x, R5.y, KC0[3].y
y: MUL_e R33.y, T1.x, (0x42800000, 64.0f).x
z: ASHR T1.z, PV26.w, (0x00000010, 2.242077543e-44f).y
w: ADD T3.w, R22.y, -R5.y VEC_102
t: U_TO_F ____, R4.y
28 x: AND_INT R7.x, T3.y, (0x00007FFF, 4.591634678e-41f).x
y: MUL_e T2.y, PS27, (0x38000100, 0.00003051850945f).y
z: MIN_DX10 R1.z, |R32.y|, 1.0f VEC_120
w: MIN_DX10 R1.w, |R25.w|, 1.0f
t: MULLO_INT ____, T0.w, T0.w
29 x: MUL_e R14.x, R24.z, PV28.y
y: ADD_INT T3.y, T0.w, PS28
z: MUL_e R26.z, T3.x, (0x42800000, 64.0f).x
w: ADD T0.w, R47.z, -T2.w VEC_120
t: F_TO_I T1.y, R3.z
30 x: ADD T3.x, T2.w, KC0[2].z
y: MUL_e R37.y, T0.z, (0x42800000, 64.0f).x
z: ADD T0.z, R22.y, -T3.z
w: ADD T2.w, R22.x, -R4.w
t: U_TO_F ____, R9.x
31 x: ADD R9.x, T3.z, KC0[3].y
y: AND_INT R4.y, T2.z, (0x00007FFF, 4.591634678e-41f).x VEC_120
z: MUL_e R3.z, PS30, (0x38000100, 0.00003051850945f).y
w: ASHR R4.w, T3.y, (0x00000010, 2.242077543e-44f).z
t: MULLO_INT ____, T2.x, T2.x
32 x: MUL R2.x, R13.z, (0x3E22F983, 0.1591549367f).x
y: MUL_e R20.y, KC1[0].z, T1.w
z: ADD_INT T2.z, T2.x, PS31
w: MIN_DX10 R9.w, |R24.w|, 1.0f
t: F_TO_I T3.y, R4.z
33 x: MUL_e R17.x, R24.z, R3.z
y: MUL_e R10.y, R3.x, R3.x
z: MUL_e R7.z, R1.y, R1.y
w: MUL_e R27.w, T0.x, (0x42800000, 64.0f).x VEC_120
t: U_TO_F ____, R6.y
34 x: ADD R8.x, R47.z, -T4.x
y: MIN_DX10 R13.y, |R25.z|, 1.0f VEC_120
z: MIN_DX10 R0.z, |R36.y|, 1.0f
w: MUL_e R0.w, PS33, (0x38000100, 0.00003051850945f).x
t: MULLO_INT ____, T1.y, T1.y
35 x: ADD R6.x, T4.x, KC0[3].z
y: MUL_e R39.y, T0.y, (0x42800000, 64.0f).x
z: MUL_e R28.z, T3.w, (0x42800000, 64.0f).x
w: ADD_INT R3.w, T1.y, PS34 VEC_120
t: F_TO_I T3.w, R10.x
36 x: ADD R10.x, R22.x, -R11.x
y: ADD R5.y, R22.y, -T1.x VEC_021
z: ADD R5.z, T1.x, KC0[4].y VEC_201
w: AND_INT R2.w, T1.z, (0x00007FFF, 4.591634678e-41f).x
t: U_TO_F ____, R5.z
37 x: MUL_e R18.x, KC1[0].z, T2.y VEC_102
y: ASHR R2.y, T2.z, (0x00000010, 2.242077543e-44f).x
z: MUL R2.z, R14.x, (0x3E22F983, 0.1591549367f).y
w: MUL_e R8.w, PS36, (0x38000100, 0.00003051850945f).z
t: MULLO_INT ____, T3.y, T3.y
38 x: MIN_DX10 R1.x, |R33.y|, 1.0f
y: MUL_e R11.y, R1.z, R1.z
z: MUL_e R11.z, R1.w, R1.w
w: ADD_INT R5.w, T3.y, PS37 VEC_120
t: F_TO_I R11.x, R5.w
39 x: MUL_e R23.x, T0.w, (0x42800000, 64.0f).x
y: MIN_DX10 R12.y, |R26.z|, 1.0f
z: MUL_e R20.z, R24.z, R0.w VEC_120
w: MIN_DX10 R10.w, |R37.y|, 1.0f
t: U_TO_F ____, T5.x
40 x: ADD R4.x, R47.z, -T3.x
y: MUL_e R7.y, PS39, (0x38000100, 0.00003051850945f).x
z: MUL_e R29.z, T2.w, (0x42800000, 64.0f).y
w: MUL_e R28.w, T0.z, (0x42800000, 64.0f).y VEC_120
t: MULLO_INT ____, T3.w, T3.w
41 x: ADD R5.x, R22.x, -R6.w
y: ADD_INT R3.y, T3.w, PS40
z: ADD R4.z, R22.y, -R9.x
w: ADD R6.w, T3.x, KC0[3].z VEC_201
t: F_TO_I R6.y, T4.y
02 ALU: ADDR(281) CNT(125) KCACHE0(CB0:0-15) KCACHE1(CB1:0-15)
42 x: ASHR T5.x, R3.w, (0x00000010, 2.242077543e-44f).x
y: SETGT T4.y, |R2.x|, (0x42480000, 50.0f).y
z: ADD T0.z, R9.x, KC0[4].y VEC_120
w: AND_INT T2.w, R4.w, (0x00007FFF, 4.591634678e-41f).z VEC_120
t: U_TO_F T3.w, R7.x
43 x: MUL_e R9.x, R9.w, R9.w
y: MUL R9.y, R20.y, (0x3E22F983, 0.1591549367f).x
z: FRACT T2.z, R2.x
w: MUL_e T0.w, R1.y, R7.z VEC_120
t: MULLO_INT ____, R11.x, R11.x
44 x: MUL_e R21.x, KC1[0].z, R3.z
y: MUL R4.y, R17.x, (0x3E22F983, 0.1591549367f).x
z: ADD_INT T4.z, R11.x, PS43 VEC_120
w: MUL_e T1.w, R3.x, R10.y VEC_201
t: U_TO_F T3.x, R4.y
45 x: MIN_DX10 R15.x, |R27.w|, 1.0f
y: MUL_e R28.y, R24.z, R8.w
z: MUL_e R17.z, R13.y, R13.y
w: MUL_e R15.w, R0.z, R0.z VEC_120
t: MULLO_INT ____, R6.y, R6.y
46 x: MUL_e R28.x, R8.x, (0x42800000, 64.0f).x
y: MIN_DX10 R17.y, |R39.y|, 1.0f
z: MIN_DX10 R18.z, |R28.z|, 1.0f
w: ADD_INT R3.w, R6.y, PS45 VEC_120
t: MUL_e R14.z, T3.w, (0x38000100, 0.00003051850945f).y
47 x: ADD T2.x, R47.z, -R6.x
y: MUL_e R42.y, R10.x, (0x42800000, 64.0f).x
z: MUL_e R34.z, R5.y, (0x42800000, 64.0f).x
w: ADD T4.w, R6.x, KC0[4].z VEC_120
t: U_TO_F T6.x, R2.w
48 x: ADD T7.x, R22.x, -R12.x
y: ADD T2.y, R5.z, KC0[5].y
z: AND_INT T1.z, R2.y, (0x00007FFF, 4.591634678e-41f).x
w: ADD T3.w, R22.y, -R5.z VEC_120
t: ASHR T3.z, R5.w, (0x00000010, 2.242077543e-44f).y
49 x: FRACT T1.x, R2.z
y: SETGT T3.y, |R2.z|, (0x42480000, 50.0f).x
z: MUL_e R19.z, R1.x, R1.x
w: MUL R5.w, R18.x, (0x3E22F983, 0.1591549367f).y VEC_120
t: MUL_e T0.x, R1.z, R11.y
50 x: MUL_e T4.x, R1.w, R11.z
y: MUL_e R29.y, KC1[0].z, R0.w
z: MUL R3.z, R20.z, (0x3E22F983, 0.1591549367f).x
w: MIN_DX10 R16.w, |R23.x|, 1.0f
t: MUL_e R18.y, R12.y, R12.y
51 x: MUL_e R16.x, R10.w, R10.w
y: MUL_e R30.y, R24.z, R7.y
z: MUL_e R30.z, R4.x, (0x42800000, 64.0f).x
w: MIN_DX10 R17.w, |R29.z|, 1.0f VEC_120
t: MIN_DX10 R19.y, |R28.w|, 1.0f
52 x: ADD R5.x, R47.z, -R6.w
y: MUL_e R43.y, R5.x, (0x42800000, 64.0f).x
z: MUL_e R35.z, R4.z, (0x42800000, 64.0f).x VEC_120
w: MUL_e R12.w, T3.x, (0x38000100, 0.00003051850945f).y VEC_120
t: ADD R5.y, R6.w, KC0[4].z
53 x: ADD R4.x, T0.z, KC0[5].y
y: ADD R2.y, R22.y, -T0.z
z: AND_INT R5.z, T5.x, (0x00007FFF, 4.591634678e-41f).x
w: ADD R7.w, R22.x, -R7.w VEC_102
t: U_TO_F R4.z, T2.w
54 x: ASHR R2.x, R3.y, (0x00000010, 2.242077543e-44f).x
y: CNDE_INT R16.y, T4.y, R2.x, T2.z VEC_120
z: MULADD_e R12.z, R11.z, R1.w, T4.x VEC_201
w: CNDE_INT R19.w, T3.y, R2.z, T1.x VEC_210
t: MUL_e R34.y, KC1[0].z, R8.w
55 x: MULADD_e R6.x, R10.y, R3.x, T1.w
y: MUL_e R3.y, R9.w, R9.x VEC_102
z: MULADD_e R8.z, R7.z, R1.y, T0.w VEC_120
w: MULADD_e R6.w, R11.y, R1.z, T0.x VEC_102
t: MUL_e R24.y, R18.z, R18.z VEC_102
56 x: SETGT R3.x, |R9.y|, (0x42480000, 50.0f).x
y: FRACT R1.y, R4.y VEC_120
z: SETGT R1.z, |R4.y|, (0x42480000, 50.0f).x VEC_120
w: FRACT R1.w, R9.y
t: MUL R7.x, R21.x, (0x3E22F983, 0.1591549367f).y
57 x: MUL_e R12.x, R0.z, R15.w
y: MUL_e R23.y, R15.x, R15.x
z: MUL_e R9.z, R13.y, R17.z
w: MUL R14.w, R28.y, (0x3E22F983, 0.1591549367f).x VEC_120
t: MIN_DX10 R21.w, |R28.x|, 1.0f
58 x: MIN_DX10 R19.x, |R42.y|, 1.0f
y: MUL_e R35.y, R24.z, R14.z
z: MUL_e R22.z, R17.y, R17.y VEC_120
w: MUL_e R29.w, T2.x, (0x42800000, 64.0f).x
t: MIN_DX10 R25.y, |R34.z|, 1.0f
59 x: ADD R8.x, R47.z, -T4.w
y: MUL_e R45.y, T7.x, (0x42800000, 64.0f).x
z: MUL_e R38.z, T3.w, (0x42800000, 64.0f).x
w: MUL_e R18.w, T6.x, (0x38000100, 0.00003051850945f).y VEC_120
t: ADD R15.z, T4.w, KC0[5].z
60 x: ADD R11.x, R22.x, -R13.x
y: ADD R6.y, R22.y, -T2.y
z: ADD R16.z, T2.y, KC0[6].y VEC_120
w: AND_INT R4.w, T3.z, (0x00007FFF, 4.591634678e-41f).x
t: U_TO_F R2.w, T1.z
61 x: FRACT R13.x, R5.w
y: ASHR R14.y, T4.z, (0x00000010, 2.242077543e-44f).x
z: SETGT R2.z, |R5.w|, (0x42480000, 50.0f).y
w: MUL_e R0.w, R1.x, R19.z
t: SETGT R8.w, |R3.z|, (0x42480000, 50.0f).y
62 x: MUL_e R10.x, R12.y, R18.y
y: MUL R15.y, R29.y, (0x3E22F983, 0.1591549367f).x VEC_201
z: FRACT R6.z, R3.z
w: MUL_e R22.w, R16.w, R16.w
t: MUL_e R10.z, R10.w, R16.x
03 ALU: ADDR(406) CNT(125) KCACHE0(CB1:0-15) KCACHE1(CB0:0-15)
63 x: MIN_DX10 R20.x, |R30.z|, 1.0f
y: MUL_e R26.y, R17.w, R17.w
z: MUL T2.z, R30.y, (0x3E22F983, 0.1591549367f).x
w: MUL_e R26.w, KC0[0].z, R7.y
t: MUL_e R23.z, R19.y, R19.y VEC_102
64 x: MUL_e R31.x, R5.x, (0x42800000, 64.0f).x
y: MIN_DX10 R27.y, |R43.y|, 1.0f
z: MUL_e R4.z, R24.z, R12.w
w: MIN_DX10 R20.w, |R35.z|, 1.0f VEC_120
t: MUL_e R7.y, R4.z, (0x38000100, 0.00003051850945f).y
65 x: ADD T6.x, R47.z, -R5.y
y: MUL_e R46.y, R7.w, (0x42800000, 64.0f).x
z: MUL_e R39.z, R2.y, (0x42800000, 64.0f).x
w: ADD T3.w, R5.y, KC1[5].z VEC_120
t: U_TO_F T7.x, R5.z
66 x: ADD ____, R22.x, -R8.y
y: ADD T2.y, R4.x, KC1[6].y VEC_120
z: ADD T4.z, R22.y, -R4.x
w: AND_INT T4.w, R2.x, (0x00007FFF, 4.591634678e-41f).x VEC_201
t: ASHR T1.z, R3.w, (0x00000010, 2.242077543e-44f).y
67 x: CNDE_INT R12.x, R1.z, R4.y, R1.y
y: CNDE_INT R1.y, R2.z, R5.w, R13.x VEC_120
z: CNDE_INT R0.z, R3.x, R9.y, R1.w VEC_201
w: MULADD_e T7.w, R15.w, R0.z, R12.x VEC_021
t: MUL_e R32.x, PV66.x, (0x42800000, 64.0f).x
68 x: MULADD_e R6.x, R10.y, (0x40400000, 3.0f).x, -R6.x VEC_021
y: CNDE_INT R10.y, R8.w, R3.z, R6.z
z: MULADD_e T5.z, R9.x, R9.w, R3.y
w: MULADD_e R6.w, R11.y, (0x40400000, 3.0f).x, -R6.w VEC_102
t: SETGT T0.w, |R7.x|, (0x42480000, 50.0f).y
69 x: MULADD_e T5.x, R18.y, R12.y, R10.x
y: MULADD_e R12.y, R7.z, (0x40400000, 3.0f).x, -R8.z VEC_021
z: MULADD_e R7.z, R19.z, R1.x, R0.w VEC_201
w: FRACT T1.w, R7.x VEC_120
t: SETGT T5.w, |R14.w|, (0x42480000, 50.0f).y
70 x: MUL_e T1.x, R15.x, R23.y
y: MULADD_e R13.y, R16.x, R10.w, R10.z VEC_120
z: FRACT T0.z, R14.w
w: MULADD_e T6.w, R17.z, R13.y, R9.z VEC_102
t: MUL R3.y, R34.y, (0x3E22F983, 0.1591549367f).x
71 x: CNDE T3.x, R13.z, R13.z, R16.y
y: MUL_e R11.y, R21.w, R21.w
z: MULADD_e R11.z, R11.z, (0x40400000, 3.0f).x, -R12.z VEC_102
w: MUL R0.w, R35.y, (0x3E22F983, 0.1591549367f).y
t: MIN_DX10 R13.x, |R29.w|, 1.0f
72 x: MUL_e T4.x, R18.z, R24.y
y: MUL_e R5.y, KC0[0].z, R14.z
z: MUL_e R3.z, R19.x, R19.x
w: MUL_e R10.w, R17.y, R22.z VEC_021
t: MUL_e R6.z, R25.y, R25.y VEC_102
73 x: MIN_DX10 R4.x, |R38.z|, 1.0f
y: MIN_DX10 R4.y, |R45.y|, 1.0f
z: MUL_e R2.z, R24.z, R18.w VEC_102
w: MUL_e R31.w, R8.x, (0x42800000, 64.0f).x
t: MUL_e R8.z, R2.w, (0x38000100, 0.00003051850945f).y
74 x: ADD R11.x, R47.z, -R15.z
y: MUL_e R47.y, R11.x, (0x42800000, 64.0f).x
z: MUL_e R41.z, R6.y, (0x42800000, 64.0f).x
w: ADD R4.w, R15.z, KC1[6].z VEC_120
t: U_TO_F R8.x, R4.w
75 x: ADD R1.x, R22.x, -R11.w
y: ADD R6.y, R16.z, KC1[7].y
z: AND_INT R15.z, R14.y, (0x00007FFF, 4.591634678e-41f).x
w: ADD R11.w, R22.y, -R16.z VEC_120
t: CNDE R14.z, R14.x, R14.x, R19.w
76 x: SETGT T2.x, |R15.y|, (0x42480000, 50.0f).x
y: MUL_e R14.y, R16.w, R22.w
z: SETGT T3.z, |T2.z|, (0x42480000, 50.0f).x
w: FRACT T2.w, R15.y
t: FRACT T0.x, T2.z
77 x: MUL_e R14.x, R17.w, R26.y
y: MUL_e R9.y, R20.x, R20.x
z: MUL_e R16.z, R19.y, R23.z
w: MUL R2.w, R26.w, (0x3E22F983, 0.1591549367f).x VEC_201
t: MUL_e R38.y, KC0[0].z, R12.w
78 x: MUL R3.x, R4.z, (0x3E22F983, 0.1591549367f).x
y: MUL_e R8.y, R27.y, R27.y
z: MIN_DX10 R1.z, |R31.x|, 1.0f
w: MUL_e R5.w, R20.w, R20.w
t: MUL_e R27.x, R24.z, R7.y
79 x: MIN_DX10 R2.x, |R46.y|, 1.0f
y: MIN_DX10 R2.y, |R39.z|, 1.0f
z: MUL_e R40.z, T6.x, (0x42800000, 64.0f).x
w: MUL_e R1.w, T7.x, (0x38000100, 0.00003051850945f).y VEC_120
t: ADD R12.w, R47.z, -T3.w
80 x: ADD R10.x, T3.w, KC1[6].z
y: MUL_e R48.y, T4.z, (0x42800000, 64.0f).x
z: ADD R13.z, R22.y, -T2.y
w: ADD R13.w, R22.x, -R13.w
t: U_TO_F R12.z, T4.w
81 x: ADD R7.x, T2.y, KC1[7].y VEC_120
y: AND_INT R15.y, T1.z, (0x00007FFF, 4.591634678e-41f).x
z: CNDE_INT R9.z, T0.w, R7.x, T1.w
w: CNDE_INT R9.w, T2.x, R15.y, T2.w VEC_201
t: CNDE_INT R3.w, T3.z, T2.z, T0.x
82 x: MULADD_e R15.x, R9.x, (0x40400000, 3.0f).x, -T5.z
y: CNDE_INT R16.y, T5.w, R14.w, T0.z VEC_021
z: MULADD_e R10.z, R17.z, (0x40400000, 3.0f).x, -T6.w VEC_021
w: MULADD_e R14.w, R23.y, R15.x, T1.x
t: COS R8.w, R16.y
83 x: MULADD_e R9.x, R22.z, R17.y, R10.w
y: MULADD_e R17.y, R24.y, R18.z, T4.x VEC_021
z: MULADD_e R17.z, R15.w, (0x40400000, 3.0f).x, -T7.w VEC_021
w: MULADD_e R10.w, R18.y, (0x40400000, 3.0f).x, -T5.x VEC_210
t: SIN R18.z, T3.x
04 ALU: ADDR(531) CNT(121) KCACHE0(CB1:0-15) KCACHE1(CB0:0-15)
84 x: MULADD_e R14.x, R16.x, (0x40400000, 3.0f).x, -R13.y
y: MULADD_e R13.y, R19.z, (0x40400000, 3.0f).x, -R7.z VEC_021
z: MULADD_e T6.z, R22.w, R16.w, R14.y VEC_021
w: MULADD_e T0.w, R26.y, R17.w, R14.x
t: COS T3.w, R0.z
85 x: ADD*4 T4.x, -R12.y, 1.0f
y: CNDE T0.y, R20.y, R20.y, R0.z VEC_102
z: ADD T1.z, -R6.x, 1.0f
w: MULADD_e T1.w, R23.z, R19.y, R16.z VEC_021
t: CNDE T4.w, R17.x, R17.x, R12.x VEC_102
86 x: SETGT T3.x, |R3.y|, (0x42480000, 50.0f).x
y: SETGT T3.y, |R0.w|, (0x42480000, 50.0f).x
z: MUL_e T3.z, R21.w, R11.y VEC_120
w: FRACT T7.w, R3.y
t: FRACT T4.y, R0.w
87 x: MUL R17.x, R5.y, (0x3E22F983, 0.1591549367f).x
y: MUL_e R19.y, R13.x, R13.x
z: MUL_e T2.z, R19.x, R3.z VEC_120
w: MUL_e T6.w, R25.y, R6.z VEC_102
t: MUL_e R40.y, KC0[0].z, R18.w
88 x: MUL R16.x, R2.z, (0x3E22F983, 0.1591549367f).x
y: MUL_e R20.y, R4.x, R4.x
z: MUL_e R0.z, R4.y, R4.y
w: MIN_DX10 R18.w, |R31.w|, 1.0f
t: MUL_e R41.y, R24.z, R8.z
89 x: MUL_e R5.x, R8.x, (0x38000100, 0.00003051850945f).x
y: MIN_DX10 R14.y, |R41.z|, 1.0f
z: MIN_DX10 R16.z, |R47.y|, 1.0f
w: MUL_e R32.w, R11.x, (0x42800000, 64.0f).y VEC_120
t: ADD R8.x, R47.z, -R4.w
90 x: ADD R1.x, R4.w, KC1[7].z
y: MUL_e R50.y, R1.x, (0x42800000, 64.0f).x
z: MUL_e R42.z, R11.w, (0x42800000, 64.0f).x VEC_120
w: ADD R11.w, R22.x, -R21.y VEC_120
t: U_TO_F R4.w, R15.z
91 x: ADD*4 T1.x, -R6.w, 1.0f
y: ADD R6.y, R22.y, -R6.y
z: ADD T4.z, -R11.z, 1.0f
w: CNDE R6.w, R18.x, R18.x, R1.y
t: COS R21.y, R19.w
92 x: CNDE R11.x, R20.z, R20.z, R10.y
y: FRACT T2.y, R2.w
z: SETGT T0.z, |R2.w|, (0x42480000, 50.0f).x
w: MUL_e T2.w, R20.x, R9.y
t: SIN R18.x, R14.z
93 x: MUL_e R24.x, R1.z, R1.z
y: MUL R1.y, R38.y, (0x3E22F983, 0.1591549367f).x
z: FRACT T5.z, R3.x
w: SETGT T5.w, |R3.x|, (0x42480000, 50.0f).y
t: COS R14.z, R1.y
94 x: MUL_e T5.x, R20.w, R5.w
y: MUL R7.y, R27.x, (0x3E22F983, 0.1591549367f).x
z: MUL_e T7.z, R27.y, R8.y
w: MUL_e R7.w, KC0[0].z, R7.y VEC_021
t: MIN_DX10 R25.x, |R40.z|, 1.0f
95 x: MUL_e R30.x, R24.z, R1.w
y: MUL_e R49.y, R12.w, (0x42800000, 64.0f).x
z: MUL_e R7.z, R2.y, R2.y
w: MUL_e R19.w, R2.x, R2.x
t: MIN_DX10 R26.x, |R32.x|, 1.0f
96 x: ADD R6.x, R47.z, -R10.x
y: MUL_e R12.y, R12.z, (0x38000100, 0.00003051850945f).x VEC_120
z: MUL_e R43.z, R13.w, (0x42800000, 64.0f).y
w: MIN_DX10 R16.w, |R48.y|, 1.0f
t: MUL_e R52.y, R13.z, (0x42800000, 64.0f).y
97 x: ADD R10.x, R22.x, -R21.z
y: CNDE_INT R15.y, T0.z, R2.w, T2.y VEC_021
z: ADD R21.z, R22.y, -R7.x
w: ADD R2.w, R10.x, KC1[7].z VEC_201
t: U_TO_F R7.x, R15.y
98 x: CNDE_INT R3.x, T3.y, R0.w, T4.y
y: CNDE_INT R22.y, T5.w, R3.x, T5.z VEC_021
z: CNDE_INT R13.z, T3.x, R3.y, T7.w
w: MUL_e R0.w, T4.x, T1.z VEC_102
t: COS R12.z, R9.z
99 x: MULADD_e R19.x, R22.z, (0x40400000, 3.0f).x, -R9.x VEC_102
y: MULADD_e R24.y, R23.y, (0x40400000, 3.0f).x, -R14.w VEC_120
z: MULADD_e R22.z, R24.y, (0x40400000, 3.0f).x, -R17.y
w: MULADD_e R14.w, R3.z, R19.x, T2.z
t: MULADD_e R17.y, R5.w, R20.w, T5.x
100 x: MUL_e R15.x, T1.x, T4.z
y: MULADD_e R25.y, R11.y, R21.w, T3.z
z: MULADD_e R20.z, R26.y, (0x40400000, 3.0f).x, -T0.w VEC_210
w: MULADD_e R20.w, R6.z, R25.y, T6.w
t: ADD R21.w, -R15.x, 1.0f
101 x: MULADD_e R20.x, R9.y, R20.x, T2.w
y: MULADD_e R26.y, R8.y, R27.y, T7.z VEC_120
z: MULADD_e R11.z, R23.z, (0x40400000, 3.0f).x, -T1.w VEC_120
w: MULADD_e R12.w, R22.w, (0x40400000, 3.0f).x, -T6.z VEC_102
t: COS R23.z, R12.x
102 x: MUL_e R21.x, R8.w, T3.w
y: MUL_e R27.y, R18.z, T3.w
z: ADD*4 R18.z, -R17.z, 1.0f VEC_120
w: CNDE R13.w, R21.x, R21.x, R9.z
t: SIN R9.z, T0.y
103 x: CNDE R9.x, R28.y, R28.y, R16.y
y: ADD R28.y, -R10.z, 1.0f
z: FRACT R10.z, R17.x
w: SETGT R8.w, |R17.x|, (0x42480000, 50.0f).x
t: SIN R23.y, T4.w
104 x: SETGT R12.x, |R16.x|, (0x42480000, 50.0f).x
y: MUL R3.y, R40.y, (0x3E22F983, 0.1591549367f).y
z: FRACT R17.z, R16.x
w: MUL_e R22.w, R13.x, R19.y VEC_120
t: MUL_e R22.x, R18.w, R18.w
05 ALU: ADDR(652) CNT(124) KCACHE0(CB1:0-15)
105 x: MUL_e T0.x, R4.x, R20.y
y: MUL_e R44.y, KC0[0].z, R8.z
z: MUL R15.z, R41.y, (0x3E22F983, 0.1591549367f).x
w: MUL_e T1.w, R4.y, R0.z VEC_201
t: MIN_DX10 R29.x, |R32.w|, 1.0f
106 x: MUL_e R8.x, R24.z, R5.x
y: MUL_e R51.y, R8.x, (0x42800000, 64.0f).x
z: MUL_e R32.z, R14.y, R14.y
w: MUL_e R17.w, R16.z, R16.z VEC_120
t: MIN_DX10 R18.y, |R50.y|, 1.0f
107 x: MIN_DX10 R1.x, |R42.z|, 1.0f
y: ADD T2.y, R47.z, -R1.x VEC_120
z: MUL_e R5.z, R4.w, (0x38000100, 0.00003051850945f).x
w: MUL_e R34.w, R11.w, (0x42800000, 64.0f).y VEC_120
t: MUL_e R45.z, R6.y, (0x42800000, 64.0f).y
108 x: MUL_e T1.x, R21.y, R14.z
y: MUL_e T3.y, R18.x, R14.z
z: ADD T3.z, -R13.y, 1.0f VEC_120
w: CNDE R6.w, R29.y, R29.y, R9.w VEC_201
t: SIN T6.z, R6.w
109 x: ADD T3.x, -R14.x, 1.0f
y: SETGT T0.y, |R1.y|, (0x42480000, 50.0f).x
z: CNDE R8.z, R30.y, R30.y, R3.w VEC_120
w: ADD*4 T3.w, -R10.w, 1.0f VEC_120
t: COS R10.w, R10.y
110 x: SETGT T5.x, |R7.y|, (0x42480000, 50.0f).x
y: MUL_e T4.y, R1.z, R24.x
z: FRACT T7.z, R7.y
w: FRACT T4.w, R1.y VEC_120
t: SIN R14.z, R11.x
111 x: MUL_e T4.x, R2.x, R19.w
y: MUL_e R10.y, R25.x, R25.x VEC_120
z: MUL_e T4.z, R2.y, R7.z
w: MUL R4.w, R7.w, (0x3E22F983, 0.1591549367f).x
t: COS R9.w, R9.w
112 x: MUL R11.x, R30.x, (0x3E22F983, 0.1591549367f).x
y: MUL_e R13.y, KC0[0].z, R1.w VEC_102
z: MIN_DX10 R33.z, |R49.y|, 1.0f
w: MUL_e R11.w, R26.x, R26.x VEC_120
t: MUL_e R14.x, R16.w, R16.w
113 x: MUL_e R6.x, R24.z, R12.y
y: MIN_DX10 R30.y, |R43.z|, 1.0f VEC_120
z: MUL_e R44.z, R6.x, (0x42800000, 64.0f).x
w: MIN_DX10 R15.w, |R52.y|, 1.0f
t: MUL_e R1.w, R7.x, (0x38000100, 0.00003051850945f).y
114 x: MUL_e R17.x, R10.x, (0x42800000, 64.0f).x
y: MUL_e R1.y, R21.z, (0x42800000, 64.0f).x VEC_120
z: CNDE_INT R21.z, R8.w, R17.x, R10.z VEC_210
w: ADD R2.w, R47.z, -R2.w VEC_210
t: CNDE_INT R27.z, T0.y, R1.y, T4.w
115 x: DOT4_e ____, R21.x, R31.y
y: DOT4_e ____, R27.y, R23.w
z: DOT4_e R9.z, R9.z, R24.w VEC_120
w: DOT4_e ____, (0x80000000, -0.0f).x, 0.0f
t: CNDE_INT R27.y, R12.x, R16.x, R17.z
116 x: DOT4_e ____, T1.x, R32.y
y: DOT4_e ____, T3.y, R25.w VEC_201
z: DOT4_e ____, T6.z, R33.y VEC_102
w: DOT4_e R0.w, (0x80000000, -0.0f).x, 0.0f
t: MUL_e R17.z, R0.w, R21.w
117 x: MUL_e R13.x, R18.z, R28.y
y: MULADD_e R7.y, R19.y, R13.x, R22.w VEC_021
z: MULADD_e R10.z, R3.z, (0x40400000, 3.0f).x, -R14.w VEC_210
w: CNDE_INT R8.w, T5.x, R7.y, T7.z VEC_021
t: MUL_e R3.z, T3.w, T3.x
118 x: MULADD_e R2.x, R11.y, (0x40400000, 3.0f).x, -R25.y
y: MULADD_e R4.y, R0.z, R4.y, T1.w VEC_210
z: MULADD_e R19.z, R6.z, (0x40400000, 3.0f).x, -R20.w VEC_021
w: MUL_e R14.w, R15.x, T3.z VEC_210
t: MULADD_e R22.w, R19.w, R2.x, T4.x
119 x: MULADD_e R15.x, R5.w, (0x40400000, 3.0f).x, -R17.y
y: MULADD_e R9.y, R9.y, (0x40400000, 3.0f).x, -R20.x VEC_021
z: MULADD_e R6.z, R20.y, R4.x, T0.x VEC_120
w: MUL_e R13.w, R23.z, R12.z
t: SIN R23.z, R13.w
120 x: MULADD_e R20.x, R8.y, (0x40400000, 3.0f).x, -R26.y
y: MULADD_e R8.y, R24.x, R1.z, T4.y VEC_021
z: ADD*4 R1.z, -R19.x, 1.0f VEC_120
w: ADD R20.w, -R22.z, 1.0f
t: SIN R18.z, R9.x
121 x: ADD R4.x, -R24.y, 1.0f
y: MUL_e R2.y, R23.y, R12.z VEC_120
z: MUL_e R22.z, R18.w, R22.x VEC_102
w: MULADD_e R25.w, R7.z, R2.y, T4.z VEC_021
t: MUL_e R23.y, R29.x, R29.x
122 x: FRACT R19.x, R15.z
y: CNDE R16.y, R34.y, R34.y, R13.z
z: SETGT R12.z, |R15.z|, (0x42480000, 50.0f).x
w: CNDE R24.w, R35.y, R35.y, R3.x VEC_120
t: COS R21.w, R16.y
123 x: SETGT R9.x, |R3.y|, (0x42480000, 50.0f).x
y: MUL R34.y, R44.y, (0x3E22F983, 0.1591549367f).y VEC_120
z: MUL_e R13.z, R16.z, R17.w
w: FRACT R5.w, R3.y
t: COS R23.w, R13.z
124 x: MUL_e R5.x, R14.y, R32.z
y: MUL_e R17.y, KC0[0].z, R5.x
z: MUL R31.z, R8.x, (0x3E22F983, 0.1591549367f).x
w: MIN_DX10 R30.w, |R51.y|, 1.0f VEC_120
t: MUL_e R36.z, R18.y, R18.y VEC_102
125 x: MUL_e R21.x, R24.z, R5.z
y: MUL_e R35.y, R1.x, R1.x
z: MIN_DX10 R37.z, |R34.w|, 1.0f
w: MUL_e R33.w, T2.y, (0x42800000, 64.0f).x
t: MIN_DX10 R24.y, |R45.z|, 1.0f
06 ALU: ADDR(776) CNT(126) KCACHE0(CB1:0-15)
126 x: MUL_e T0.x, R10.w, R9.w
y: MUL_e T4.y, R14.z, R9.w
z: ADD*4 T6.z, -R20.z, 1.0f VEC_120
w: ADD T3.w, -R11.z, 1.0f VEC_201
t: SIN T3.z, R6.w
127 x: CNDE T1.x, R4.z, R4.z, R22.y
y: MUL_e T3.y, R25.x, R10.y
z: ADD T7.z, -R12.w, 1.0f
w: CNDE T4.w, R26.w, R26.w, R15.y VEC_120
t: COS T0.y, R3.w
128 x: FRACT T4.x, R11.x
y: FRACT T2.y, R4.w
z: SETGT T4.z, |R4.w|, (0x42480000, 50.0f).x
w: SETGT T1.w, |R11.x|, (0x42480000, 50.0f).x
t: SIN T5.x, R8.z
129 x: MUL_e T3.x, R16.w, R14.x
y: MUL R15.y, R13.y, (0x3E22F983, 0.1591549367f).x
z: MUL_e T2.z, R26.x, R11.w
w: MUL_e R3.w, R33.z, R33.z
t: COS T1.z, R15.y
130 x: MIN_DX10 R12.x, |R44.z|, 1.0f
y: MUL_e R12.y, R30.y, R30.y
z: MUL R8.z, R6.x, (0x3E22F983, 0.1591549367f).x
w: MUL_e R12.w, KC0[0].z, R12.y
t: MUL_e R4.z, R15.w, R15.w
131 x: MUL_e R7.x, R2.w, (0x42800000, 64.0f).x VEC_120
y: MIN_DX10 R3.y, |R17.x|, 1.0f
z: MUL_e R11.z, R24.z, R1.w VEC_021
w: MIN_DX10 R2.w, |R1.y|, 1.0f
t: CNDE_INT R24.z, R9.x, R3.y, R5.w
132 x: DOT4_e ____, R13.w, R36.y
y: DOT4_e ____, R2.y, R25.z VEC_201
z: DOT4_e ____, R23.z, R27.w VEC_120
w: DOT4_e T2.w, (0x80000000, -0.0f).x, 0.0f
t: CNDE_INT R9.x, T4.z, R4.w, T2.y
133 x: DOT4_e ____, T0.x, R26.z
y: DOT4_e R37.y, T4.y, R37.y
z: DOT4_e ____, T3.z, R23.x
w: DOT4_e ____, (0x80000000, -0.0f).x, 0.0f
t: MUL_e R20.w, R1.z, R20.w
134 x: CNDE_INT R23.x, R12.z, R15.z, R19.x VEC_120
y: CNDE_INT R19.y, T1.w, R11.x, T4.x
z: MULADD_e R12.z, R19.y, (0x40400000, 3.0f).x, -R7.y
w: MULADD_e R0.w, R0.z, (0x40400000, 3.0f).x, -R4.y VEC_021
t: MULADD_e R0.z, R0.w, R14.w, 0.0f
135 x: MULADD_e R11.x, R20.y, (0x40400000, 3.0f).x, -R6.z
y: MULADD_e T4.y, R9.z, R17.z, 0.0f
z: MULADD_e R6.z, R24.x, (0x40400000, 3.0f).x, -R8.y
w: MUL_e T1.w, R13.x, R4.x VEC_120
t: MULADD_e R9.z, R19.w, (0x40400000, 3.0f).x, -R22.w VEC_021
136 x: ADD R2.x, -R2.x, 1.0f
y: MULADD_e R8.y, R22.x, R18.w, R22.z VEC_120
z: MULADD_e R16.z, R10.y, R25.x, T3.y VEC_021
w: MULADD_e R18.w, R17.w, R16.z, R13.z
t: SIN R13.z, R16.y
137 x: MULADD_e R13.x, R14.x, R16.w, T3.x
y: MUL_e R14.y, R3.z, T7.z
z: MUL_e R3.z, R21.w, R23.w VEC_021
w: MULADD_e R21.w, R32.z, R14.y, R5.x VEC_201
t: SETGT R16.w, |R34.y|, (0x42480000, 50.0f).x
138 x: MUL_e R4.x, T6.z, T3.w
y: MULADD_e R4.y, R11.w, R26.x, T2.z
z: MULADD_e R17.z, R7.z, (0x40400000, 3.0f).x, -R25.w VEC_102
w: FRACT R25.w, R34.y
t: COS R7.z, R3.x
139 x: ADD*4 R26.x, -R10.z, 1.0f
y: MUL_e R5.y, R18.z, R23.w VEC_102
z: MUL_e R10.z, R29.x, R23.y
w: CNDE R24.w, R5.y, R5.y, R21.z
t: SIN R7.y, R24.w
140 x: CNDE R25.x, R2.z, R2.z, R27.y
y: ADD R16.y, -R19.z, 1.0f VEC_120
z: MUL R1.z, R17.y, (0x3E22F983, 0.1591549367f).x
w: MUL_e R27.w, R30.w, R30.w
t: COS R21.z, R21.z
141 x: MUL_e R5.x, R18.y, R36.z VEC_021
y: MUL_e R20.y, R1.x, R35.y
z: FRACT R2.z, R31.z
w: SETGT R23.w, |R31.z|, (0x42480000, 50.0f).x
t: MUL_e R10.x, KC0[0].z, R5.z
142 x: MIN_DX10 R16.x, |R33.w|, 1.0f
y: MUL_e R2.y, R37.z, R37.z
z: MUL_e R23.z, R24.y, R24.y
w: MUL R13.w, R21.x, (0x3E22F983, 0.1591549367f).x
t: MUL_e R3.x, T0.y, T1.z
143 x: ADD R20.x, -R9.y, 1.0f
y: MUL_e R9.y, T5.x, T1.z
z: CNDE R22.z, R38.y, R38.y, R27.z VEC_120
w: ADD*4 R19.w, -R20.x, 1.0f VEC_120
t: SIN R5.z, T4.w
144 x: SETGT R15.x, |R15.y|, (0x42480000, 50.0f).x
y: FRACT R22.y, R15.y
z: CNDE R15.z, R27.x, R27.x, R8.w
w: ADD R22.w, -R15.x, 1.0f VEC_120
t: COS R14.w, R22.y
145 x: SETGT R27.x, |R8.z|, (0x42480000, 50.0f).x
y: MUL_e R38.y, R33.z, R3.w VEC_120
z: FRACT R19.z, R8.z
w: MUL R4.w, R12.w, (0x3E22F983, 0.1591549367f).y
t: SIN R18.z, T1.x
146 x: MUL_e R24.x, R30.y, R12.y
y: MUL_e R26.y, R12.x, R12.x
z: MUL_e R27.z, R15.w, R4.z
w: MUL_e R26.w, KC0[0].z, R1.w
t: COS R1.w, R27.z
147 x: MUL R19.x, R11.z, (0x3E22F983, 0.1591549367f).x
y: MUL_e R36.y, R3.y, R3.y VEC_120
z: MIN_DX10 R25.z, |R7.x|, 1.0f
w: MUL_e R5.w, R2.w, R2.w
t: MULADD_e R26.z, T2.w, T1.w, T4.y
07 ALU: ADDR(902) CNT(126)
148 x: DOT4_e ____, R3.z, R39.y VEC_021
y: DOT4_e ____, R5.y, R28.z
z: DOT4_e T3.z, R13.z, R28.x VEC_201
w: DOT4_e ____, (0x80000000, -0.0f).x, 0.0f
t: CNDE_INT R5.y, R16.w, R34.y, R25.w
149 x: DOT4_e ____, R3.x, R29.z
y: DOT4_e ____, R9.y, R28.w VEC_201
z: DOT4_e ____, R5.z, R30.z VEC_021
w: DOT4_e T2.w, (0x80000000, -0.0f).x, 0.0f
t: CNDE_INT R5.z, R15.x, R15.y, R22.y
150 x: MULADD_e T5.x, R37.y, R14.y, R0.z VEC_021
y: CNDE_INT R14.y, R23.w, R31.z, R2.z VEC_120
z: MUL_e T4.z, R20.w, R2.x
w: MUL_e T0.w, R26.x, R16.y
t: SIN T2.z, R24.w
151 x: MULADD_e T2.x, R22.x, (0x40400000, 3.0f).x, -R8.y VEC_021
y: MULADD_e T7.y, R32.z, (0x40400000, 3.0f).x, -R21.w VEC_201
z: MULADD_e R19.z, R17.w, (0x40400000, 3.0f).x, -R18.w
w: CNDE_INT R17.w, R27.x, R8.z, R19.z VEC_201
t: MULADD_e R8.z, R35.y, R1.x, R20.y
152 x: MULADD_e R5.x, R36.z, R18.y, R5.x VEC_021
y: MUL_e R10.y, R19.w, R22.w
z: MULADD_e R10.z, R23.y, R29.x, R10.z VEC_102
w: MULADD_e R19.w, R10.y, (0x40400000, 3.0f).x, -R16.z VEC_021
t: SIN R16.z, R25.x
153 x: MULADD_e R4.x, R11.w, (0x40400000, 3.0f).x, -R4.y VEC_021
y: MULADD_e T5.y, R3.w, R33.z, R38.y VEC_120
z: MULADD_e R27.z, R4.z, R15.w, R27.z VEC_021
w: MUL_e T4.w, R4.x, R20.x
t: COS R11.w, R27.y
154 x: MULADD_e R13.x, R14.x, (0x40400000, 3.0f).x, -R13.x
y: MUL_e T4.y, R7.z, R21.z VEC_102
z: MUL_e T1.z, R7.y, R21.z VEC_102
w: MULADD_e T6.w, R12.y, R30.y, R24.x VEC_021
t: ADD T4.x, -R12.z, 1.0f
155 x: ADD*4 T7.x, -R0.w, 1.0f
y: CNDE T6.y, R40.y, R40.y, R24.z
z: ADD T0.z, -R11.x, 1.0f
w: CNDE R0.w, R41.y, R41.y, R23.x VEC_102
t: COS R15.w, R24.z
156 x: SETGT T1.x, |R1.z|, (0x42480000, 50.0f).x
y: SETGT T3.y, |R13.w|, (0x42480000, 50.0f).x VEC_201
z: MUL_e R24.z, R30.w, R27.w
w: FRACT T1.w, R1.z
t: FRACT T2.y, R13.w
157 x: MUL R14.x, R10.x, (0x3E22F983, 0.1591549367f).x
y: MUL_e R30.y, R16.x, R16.x VEC_120
z: MUL_e R7.z, R37.z, R2.y
w: MUL_e T7.w, R24.y, R23.z
t: MUL_e T3.x, R14.w, R1.w
158 x: CNDE R11.x, R30.x, R30.x, R19.y
y: MUL_e T0.y, R18.z, R1.w
z: ADD R6.z, -R6.z, 1.0f VEC_120
w: CNDE R1.w, R7.w, R7.w, R9.x
t: SIN T6.z, R22.z
159 x: ADD T0.x, -R17.z, 1.0f
y: ADD*4 T1.y, -R9.z, 1.0f VEC_120
z: SETGT T7.z, |R4.w|, (0x42480000, 50.0f).x
w: FRACT T3.w, R4.w
t: COS R40.y, R8.w
160 x: SETGT T6.x, |R19.x|, (0x42480000, 50.0f).x
y: MUL R41.y, R26.w, (0x3E22F983, 0.1591549367f).y
z: FRACT T5.z, R19.x
w: MUL_e T5.w, R12.x, R26.y VEC_120
t: SIN R30.x, R15.z
161 x: MUL_e T1.x, R2.w, R5.w
y: CNDE_INT R7.y, T1.x, R1.z, T1.w
z: MUL_e R1.z, R3.y, R36.y
w: MUL_e R8.w, R25.z, R25.z
t: COS R15.z, R9.x
162 x: DOT4_e ____, T4.y, R42.y
y: DOT4_e ____, T1.z, R34.z
z: DOT4_e ____, T2.z, R29.w VEC_201
w: DOT4_e R29.w, (0x80000000, -0.0f).x, 0.0f
t: MULADD_e R34.z, T2.w, T4.w, T5.x
163 x: DOT4_e ____, T3.x, R43.y
y: DOT4_e R42.y, T0.y, R35.z
z: DOT4_e ____, T6.z, R31.x
w: DOT4_e ____, (0x80000000, -0.0f).x, 0.0f
t: CNDE_INT R9.x, T7.z, R4.w, T3.w
164 x: CNDE_INT R24.x, T3.y, R13.w, T2.y
y: MULADD_e R43.y, T3.z, T4.z, R26.z
z: MUL_e R26.z, T1.y, T0.x VEC_102
w: MUL_e R13.w, T0.w, T4.x
t: ADD R4.w, -T2.x, 1.0f
165 x: MULADD_e R19.x, R23.y, (0x40400000, 3.0f).x, -R10.z
y: CNDE_INT R12.y, T6.x, R19.x, T5.z VEC_021
z: MULADD_e R10.z, R3.w, (0x40400000, 3.0f).x, -T5.y
w: MUL_e R3.w, T7.x, T0.z VEC_102
t: MULADD_e R35.z, R12.y, (0x40400000, 3.0f).x, -T6.w VEC_021
166 x: MULADD_e R31.x, R35.y, (0x40400000, 3.0f).x, -R8.z
y: MULADD_e R35.y, R36.z, (0x40400000, 3.0f).x, -R5.x VEC_120
z: MULADD_e R24.z, R27.w, R30.w, R24.z VEC_120
w: MULADD_e R30.w, R26.y, R12.x, T5.w VEC_120
t: SIN R8.z, T6.y
167 x: MULADD_e R12.x, R23.z, R24.y, T7.w VEC_021
y: MULADD_e R24.y, R5.w, R2.w, T1.x VEC_021
z: ADD R7.z, -T7.y, 1.0f VEC_120
w: MULADD_e R2.w, R2.y, R37.z, R7.z
t: COS R37.z, R23.x
168 x: MUL_e R23.x, R11.w, R15.w
y: MUL_e R10.y, R10.y, R6.z
z: MULADD_e R4.z, R4.z, (0x40400000, 3.0f).x, -R27.z
w: CNDE R0.w, R44.y, R44.y, R5.y VEC_102
t: SIN R44.y, R0.w
169 x: MUL_e R5.x, R16.z, R15.w
y: MULADD_e R3.y, R36.y, R3.y, R1.z
z: ADD*4 R1.z, -R19.z, 1.0f VEC_120
w: SETGT R15.w, |R14.x|, (0x42480000, 50.0f).x
t: COS R16.z, R5.y
08 ALU: ADDR(1028) CNT(122)
170 x: CNDE T7.x, R8.x, R8.x, R14.y
y: MUL_e T7.y, R40.y, R15.z
z: FRACT T0.z, R14.x VEC_120
w: MUL_e T7.w, R16.x, R30.y VEC_210
t: SIN T5.z, R1.w
171 x: ADD T6.x, -R19.w, 1.0f
y: MUL_e T6.y, R30.x, R15.z
z: CNDE T4.z, R13.y, R13.y, R5.z
w: ADD*4 T5.w, -R4.x, 1.0f VEC_120
t: COS T0.w, R19.y
172 x: SETGT T1.x, |R41.y|, (0x42480000, 50.0f).x
y: FRACT T5.y, R41.y
z: CNDE T6.z, R6.x, R6.x, R17.w
w: ADD T6.w, -R13.x, 1.0f VEC_120
t: SIN T3.z, R11.x
173 x: MULADD_e T0.x, R42.y, R10.y, R34.z VEC_021
y: MUL_e T1.y, R25.z, R8.w
z: MULADD_e T2.z, R29.w, R13.w, R43.y VEC_021
w: ADD T4.w, -R19.x, 1.0f
t: COS T3.w, R5.z
174 x: DOT4_e ____, R23.x, R45.y
y: DOT4_e ____, R5.x, R38.z VEC_210
z: DOT4_e T7.z, R8.z, R31.w VEC_201
w: DOT4_e ____, (0x80000000, -0.0f).x, 0.0f
t: CNDE_INT T4.y, R15.w, R14.x, T0.z
175 x: DOT4_e ____, T7.y, R46.y
y: DOT4_e ____, T6.y, R39.z VEC_201
z: DOT4_e ____, T5.z, R40.z VEC_120
w: DOT4_e T2.w, (0x80000000, -0.0f).x, 0.0f
t: MUL_e T1.z, R3.w, R4.w
176 x: MUL_e T1.x, R1.z, R7.z
y: MULADD_e T2.y, R27.w, (0x40400000, 3.0f).x, -R24.z
z: CNDE_INT R1.z, T1.x, R41.y, T5.y
w: MULADD_e R0.w, R2.y, (0x40400000, 3.0f).x, -R2.w VEC_021
t: SIN T0.z, R0.w
177 x: MULADD_e T6.x, R30.y, R16.x, T7.w
y: MUL_e T5.y, T5.w, T6.w
z: MULADD_e R23.z, R23.z, (0x40400000, 3.0f).x, -R12.x
w: MUL_e T5.w, R26.z, T6.x VEC_102
t: MULADD_e T4.x, R36.y, (0x40400000, 3.0f).x, -R3.y VEC_021
178 x: MULADD_e T3.x, R5.w, (0x40400000, 3.0f).x, -R24.y
y: MULADD_e T0.y, R26.y, (0x40400000, 3.0f).x, -R30.w VEC_021
z: MULADD_e R25.z, R8.w, R25.z, T1.y VEC_201
w: MUL_e T6.w, R37.z, R16.z VEC_120
t: SIN R37.z, T7.x
179 x: ADD*4 T7.x, -R35.y, 1.0f
y: MUL_e T1.y, R44.y, R16.z VEC_120
z: ADD T5.z, -R31.x, 1.0f
w: CNDE T1.w, R21.x, R21.x, R24.x VEC_102
t: COS T7.w, R14.y
180 x: MUL_e T2.x, T0.w, T3.w
y: CNDE T7.y, R17.y, R17.y, R7.y
z: MUL_e T3.z, T3.z, T3.w
w: ADD T0.w, -R10.z, 1.0f VEC_120
t: COS T3.w, R7.y
181 x: ADD T0.x, -R4.z, 1.0f
y: ADD*4 T6.y, -R35.z, 1.0f VEC_120
z: MULADD_e R4.z, T2.w, T5.w, T0.x
w: CNDE T2.w, R12.w, R12.w, R9.x VEC_201
t: SIN T4.z, T4.z
182 x: CNDE T6.x, R11.z, R11.z, R12.y
y: MUL_e R30.y, T7.x, T5.z
z: MULADD_e R11.z, R30.y, (0x40400000, 3.0f).x, -T6.x
w: MUL_e T4.w, T1.x, T4.w VEC_102
t: COS T3.y, R17.w
183 x: DOT4_e ____, T6.w, R47.y
y: DOT4_e ____, T1.y, R41.z
z: DOT4_e ____, T0.z, R32.w
w: DOT4_e T6.w, (0x80000000, -0.0f).x, 0.0f
t: SIN T1.x, T6.z
184 x: DOT4_e ____, T2.x, R32.x
y: DOT4_e T1.y, T3.z, R48.y
z: DOT4_e ____, T4.z, R49.y VEC_120
w: DOT4_e ____, (0x80000000, -0.0f).x, 0.0f
t: COS T3.z, R9.x
185 x: MUL_e T0.x, T5.y, T0.w
y: MULADD_e T5.y, T7.z, T1.z, T2.z
z: MUL_e T2.z, T6.y, T0.x VEC_120
w: MUL_e T0.w, T7.w, T3.w VEC_021
t: SIN T1.z, T7.y
186 x: ADD T2.x, -T2.y, 1.0f
y: MUL_e T2.y, R37.z, T3.w
z: CNDE R41.z, R10.x, R10.x, T4.y
w: MULADD_e R8.w, R8.w, (0x40400000, 3.0f).x, -R25.z
t: COS R25.z, R24.x
187 x: ADD R10.x, -R23.z, 1.0f
y: ADD*4 R48.y, -R0.w, 1.0f
z: ADD T7.z, -T0.y, 1.0f
w: MUL_e T1.w, T3.y, T3.z VEC_120
t: SIN R49.y, T1.w
188 x: CNDE R9.x, R26.w, R26.w, R1.z
y: MUL_e T4.y, T1.x, T3.z
z: ADD R23.z, -T3.x, 1.0f VEC_120
w: ADD*4 R26.w, -T4.x, 1.0f VEC_201
t: COS R37.z, T4.y
189 x: DOT4_e ____, T0.w, R50.y
y: DOT4_e ____, T2.y, R42.z
z: DOT4_e R42.z, T1.z, R51.y VEC_021
w: DOT4_e ____, (0x80000000, -0.0f).x, 0.0f
t: SIN ____, T2.w
190 x: DOT4_e ____, T1.w, R43.z
y: DOT4_e ____, T4.y, R52.y
z: DOT4_e ____, PS189, R44.z VEC_021
w: DOT4_e R32.w, (0x80000000, -0.0f).x, 0.0f
t: COS R0.w, R12.y
191 x: MULADD_e R24.x, T1.y, T0.x, R4.z
y: MUL_e R30.y, R30.y, T2.x VEC_102
z: MULADD_e R43.z, T6.w, T4.w, T5.y
w: MUL_e R17.w, T2.z, T7.z
t: SIN R4.z, T6.x
09 ALU: ADDR(1150) CNT(44)
192 x: MUL_e T2.x, R48.y, R10.x
y: MUL_e T5.y, R49.y, R37.z VEC_102
z: MUL_e ____, R26.w, R23.z
w: ADD ____, -R8.w, 1.0f VEC_120
t: COS T4.w, R1.z
193 x: MUL_e T0.x, R25.z, R37.z
y: MUL_e T1.y, R0.w, PS192
z: MULADD_e T2.z, R32.w, R17.w, R24.x VEC_120
w: MUL_e T6.w, PV192.z, PV192.w
t: SIN ____, R41.z
194 x: ADD T6.x, -R11.z, 1.0f
y: MUL_e ____, R4.z, T4.w VEC_120
z: MUL_e ____, PS193, R33.w
t: SIN ____, R9.x
195 x: DOT4_e ____, T1.y, R17.x
y: DOT4_e ____, PV194.y, R1.y
z: DOT4_e ____, PS194, R7.x VEC_021
w: DOT4_e ____, (0x80000000, -0.0f).x, 0.0f
t: MULADD_e ____, T5.y, R45.z, PV194.z
196 x: MUL_e ____, T2.x, T6.x
y: MULADD_e ____, R42.z, R30.y, R43.z
z: MULADD_e T2.z, PV195.x, T6.w, T2.z VEC_021
w: MULADD_e ____, T0.x, R34.w, PS195 VEC_201
197 z: MULADD_e ____, PV196.w, PV196.x, PV196.y
198 w: MULADD_e T6.w, T2.z, 0.5, PV197.z
199 x: INTERP_XY R30.x, R0.y, Param3.x VEC_210
y: INTERP_XY R30.y, R0.x, Param3.x VEC_210
z: INTERP_XY ____, R0.y, Param3.x VEC_210
w: INTERP_XY ____, R0.x, Param3.x VEC_210
200 z: MULADD_e ____, T6.w, (0x40400000, 3.0f).x, R46.z
201 y: MULADD_e T1.y, R47.z, (0x42F00000, 120.0f).x, PV200.z
202 x: MUL T0.x, PV201.y, (0x3E22F983, 0.1591549367f).x
203 z: FRACT ____, PV202.x
w: SETGT ____, |PV202.x|, (0x42480000, 50.0f).x
204 y: CNDE_INT ____, PV203.w, T0.x, PV203.z
205 x: CNDE ____, T1.y, T1.y, PV204.y
206 t: SIN ____, PV205.x
207 w: MULADD_e R34.w, PS206, (0x3F266666, 0.6499999762f).y, (0x3EB33333, 0.349999994f).x
10 TEX: ADDR(1264) CNT(2) VALID_PIX
208 SAMPLE R30.xyz_, R30.xy0x, t1, s0
209 SAMPLE_LZ R47.xyz_, R34.ww0w, t0, s1
11 ALU: ADDR(1194) CNT(65) KCACHE0(CB1:0-15)
210 x: INTERP_XY T1.x, R0.y, Param2.x VEC_210
y: INTERP_XY T1.y, R0.x, Param2.x VEC_210
z: INTERP_XY ____, R0.y, Param2.x VEC_210
w: INTERP_XY ____, R0.x, Param2.x VEC_210
211 x: INTERP_ZW ____, R0.y, Param2.x VEC_210
y: INTERP_ZW ____, R0.x, Param2.x VEC_210
z: INTERP_ZW T2.z, R0.y, Param2.x VEC_210
w: INTERP_ZW ____, R0.x, Param2.x VEC_210
212 x: DOT4_e ____, T1.x, T1.x
y: DOT4_e ____, T1.y, T1.y
z: DOT4_e ____, PV211.z, PV211.z
w: DOT4_e ____, (0x80000000, -0.0f).x, 0.0f
t: MULADD_e T2.x, R30.x, (0x40000000, 2.0f).y, -1.0f
213 x: ADD T0.x, -PV212.x, 1.0f
y: MULADD_e T5.y, R30.y, (0x40000000, 2.0f).x, -1.0f
z: MULADD_e T7.z, R30.z, (0x40000000, 2.0f).x, -1.0f
t: RSQ_e T6.w, PV212.x
214 x: INTERP_XY T6.x, R0.y, Param1.x VEC_210
y: INTERP_XY T6.y, R0.x, Param1.x VEC_210
z: INTERP_XY ____, R0.y, Param1.x VEC_210
w: INTERP_XY ____, R0.x, Param1.x VEC_210
215 x: INTERP_ZW ____, R0.y, Param1.x VEC_210
y: INTERP_ZW ____, R0.x, Param1.x VEC_210
z: INTERP_ZW T1.z, R0.y, Param1.x VEC_210
w: INTERP_ZW ____, R0.x, Param1.x VEC_210
216 x: MUL_e T0.x, T1.x, T6.w
y: MUL_e T1.y, T1.y, T6.w
z: MUL_e T2.z, T2.z, T6.w
w: MAX_DX10 T4.w, T0.x, 0.0f VEC_120
t: MUL_e ____, PV215.z, PV215.z
217 x: DOT4_e ____, PV216.x, T2.x
y: DOT4_e ____, PV216.y, T5.y VEC_102
z: DOT4_e ____, PV216.z, T7.z
w: DOT4_e ____, (0x80000000, -0.0f).x, 0.0f
t: MULADD_e ____, T6.y, T6.y, PS216
218 x: MOV*2 ____, PV217.x
y: MOV ____, PV217.x CLAMP
w: MULADD_e ____, T6.x, T6.x, PS217
219 x: MULADD_e T0.x, T0.x, -PV218.x, T2.x
y: MULADD_e T1.y, T1.y, -PV218.x, T5.y
z: MULADD_e T2.z, T2.z, -PV218.x, T7.z
w: ADD ____, PV218.y, (0x3ECCCCCD, 0.400000006f).x
t: RSQ_e ____, PV218.w
220 x: MUL_e ____, T6.x, PS219
y: MUL_e ____, T6.y, PS219
z: MUL_e ____, T1.z, PS219
w: MIN_DX10 T6.w, PV219.w, 1.0f
221 x: DOT4_e ____, PV220.x, T0.x CLAMP
y: DOT4_e ____, PV220.y, T1.y CLAMP
z: DOT4_e ____, PV220.z, T2.z CLAMP
w: DOT4_e ____, (0x80000000, -0.0f).x, 0.0f CLAMP
222 t: LOG_e ____, PV221.x
223 w: MUL_e ____, PS222, KC0[0].y
224 t: EXP_e ____, PV223.w
225 x: MULADD_e ____, T6.w, R47.z, PS224
y: MULADD_e ____, T6.w, R47.y, PS224
z: MULADD_e ____, T6.w, R47.x, PS224
226 x: MUL_e R53.x, T4.w, PV225.z
y: MUL_e R53.y, T4.w, PV225.y
z: MUL_e R53.z, T4.w, PV225.x
12 EXP_DONE: PIX0, R53
END_OF_PROGRAM
I specifically kept those things. Seems you totally forgot the int32 multiplier.Your figure is arrived at with fairytale accounting. Much of what is eliminated is simply transplanted, like the LUTs, squarer, cuber, exponent processing (exp, log, div).
Which is?You're going to increase the longest path,
Transcendental throughput isn't very important.Well now you're reducing throughput even more. With the current architecture, in two cycles you can do 1 or 2 transcendentals and 9 or 8 regular ops. With your modification, you can only do one transcendental (and maybe one regular op alongside the square/cube cycle).
The stagger is 1 cycle twixt lanes as far as I can tell.Whole SIMD staggering is actually isomorphic to the quasi-scalar architecture I was proposing. It has the same requirement of needing more active batches. Four cycles of stagger between x and y, y and z, and z and w, for example, would have 20 total cycles of latency before proceeding to the next instruction group, so you'd need 5 active wavefronts instead of 2.
Which restrictions? It can do ADDs, MULs and MADs.I'm pretty sure that ATI isn't doing this, though. The restrictions on dependent math are a big hint, IMO.
How does one get a single core to perform 32 reads per clock if it can issue 'only' one load per instruction (and one instruction per clock..ok four clocks, but you get the gist)?
That's an effect, not the cause. I wasn't aware of the LOAD2/WRITE2 instructions.Because the LDS is banked 32 way.
This is all of the LDS operations:I believe it's 4 clocks latency between issuing the read request vs. when the data is available. Is it possible you've issued some prior LDS_READ2s in your code?
19 x: LDS_WRITE ____, R1.x, R0.x
z: ADD_INT T0.z, R1.w, 1
t: ADD R0.x, R5.x, 1.0f
20 x: LDS_READ2_RET QAB, R1.w, PV19.z
21 y: MOV T0.y, QB.pop VEC_120
w: MOV T0.w, QA.pop
Low Latency Access per SIMD Engine
• 0 latency direct reads (Conflict free or Broadcast)
• 1 VLIW instrucMon latency for LDS indirect Op
Bandwidth isn't major bottleneck for HD5870, but it's more bottlenecking factor than for HD4870. I also think, that with faster front-end RV870 would be bottlenecked by bandwidth... If the RV870's successor brings faster front-end, higher bandwidth will be needed to unveil this advantage...We understand what you mean, but the 256 bit bus isn't bottlenecking it at the moment.
Ok thanks, I must be getting confused. I might have been looking at an OpenCL sample that had a default thread group size of 256.This is all of the LDS operations:
Code:19 x: LDS_WRITE ____, R1.x, R0.x z: ADD_INT T0.z, R1.w, 1 t: ADD R0.x, R5.x, 1.0f 20 x: LDS_READ2_RET QAB, R1.w, PV19.z 21 y: MOV T0.y, QB.pop VEC_120 w: MOV T0.w, QA.pop
There's no clause break between the write and the reads because the work group size is equal to the hardware thread size.
The L2 to L1 cache bandwith is already 435GB/s on cypress and the aggregated L1 texture cache bandwith is 1 TB/s.(and these should be on 850 MHz clock)
Those theoretical flops are paralel on the 20 SIMDs and 1600 SP so the 1 byte/flop could be reached with just 20(L1 cache)x138 GB/s.
Why is noone using some kind of tile based rendering on discret graphics.
At least multisampling could be done entirely on chip on a tilecache and dont waste frame buffer bandwith and space. For cards with 60-70 GB/s it could be quite handy.
The latency may be higher if there is a bank conflict, in this case it will just stall, if there is no bank conflict it will be ready to be used next cycle, like your code sugests.Actual compiled ISA:
Code:20 x: LDS_READ2_RET QAB, R1.w, PV19.z 21 y: MOV T0.y, QB.pop VEC_120 w: MOV T0.w, QA.pop
The earlier snippet I posted has 1 cycle latency between enqueue and pop, same as this snippet.
So I'm not sure what you're saying about latency
Jawed
And their register file?As far as their current cache bandwidths go, the caches in general are too small to significantly cut the miss rates.
And their register file?
Each hardware thread can have 128KB of register file if it wants. That's 2KB per work item.small as well. While the numbers may sound large, they each thread only gets a limited amount of space.
The longest possible clause length is 32 logical cycles, 256 physical cycles.Does anyone know the average ALU clause length of a game like Crysis?
Clause temporary registers (Evergreen provides up to 8 of them, prior GPUs only provided 4 - so it seems 4 was too few... I'm not sure why 8 is considered enough, to be honest) are sort of a half-way house. They're registers whose lifetime is highly constrained.I wonder how much sense it would make to simply say screw registers, time for a memory to memory architecture. Spilling register sets to cache sounds nice in theory, but if in practice you are pushing/popping all the time it starts to become a bit silly.
My understanding of Evergreen ISA is that the "MOV dst, QA" instruction is an unnecessary detour.The latency may be higher if there is a bank conflict, in this case it will just stall, if there is no bank conflict it will be ready to be used next cycle, like your code sugests.
11 x: LDS_READ2_RET QAB, R10.x, R10.y
12 x: ADD R0.x, R2.x, QA
y: ADD R0.y, R2.y, QA
z: ADD R0.z, R2.z, QA
w: ADD R0.w, R2.w, QA
t: ADD R0.x, R0.x, QB
13 x: ADD R0.y, R0.y, QB
y: ADD R0.z, R0.z, QB
z: ADD R0.w, R0.w, QB.pop
Each hardware thread can have 128KB of register file if it wants. That's 2KB per work item.
Jawed
6 hardware threads, up to around 670 bytes, is an entirely sane allocation i.e. 2 threads in control flow, 2 threads in ALU and 2 threads in TEX.The issue is that with a realistic number of threads what is the actual number available.