If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#3401 |
|
Senior Member
Join Date: Oct 2002
Posts: 2,436
|
|
|
|
|
|
|
#3402 |
|
Member
Join Date: Jul 2010
Location: Land of Mu
Posts: 350
|
nApoleon says 1120sp are official for XT: http://www.chiphell.com/thread-130363-1-1.html
While pclab.pl post more pics of XFX cards, saying the 6870 has 1280 shaders: http://pclab.pl/news43574.html . Funnily they are unsure about 6850 final clocks, either 725 or 775 mhz. Specualtion: Barts chip has indeed 1280sp's, but the 6870 and 6850 are salvage parts with parts of the chip deactivated. They seem to be "good enough" anyways, while the full Barts chips can be saved for either a 6890 or a dual card (Antilles?). That would explain the architecture quirks too and would explain all this confusion. |
|
|
|
|
|
#3403 |
|
Member
Join Date: Jan 2010
Posts: 845
|
Honestly AMD must be pissing themselves with laughter.
|
|
|
|
|
|
#3404 |
|
Member
Join Date: Jan 2010
Location: Hamburg, Germany
Posts: 987
|
Anyone asking what happened to the rumored 4 slot VLIW units? Guess you will see some ISA code for that tonight
Last edited by Gipsel; 19-Oct-2010 at 00:29. |
|
|
|
|
|
#3405 |
|
Regular
|
You sneaky chappy! Hmm, very interesting.
__________________
Can it play WoW? |
|
|
|
|
|
#3406 |
|
Member
Join Date: Jan 2010
Location: Hamburg, Germany
Posts: 987
|
|
|
|
|
|
|
#3407 | |
|
yes, i'm drunk
|
Quote:
For someone who doesn't understand crap about shadercode etc, what exactly does Gipsels post mean?
__________________
I'm nothing but a shattered soul... Been ravaged by the chaotic beauty... Ruined by the unreal temptations... I was betrayed by my own beliefs... |
|
|
|
|
|
|
#3408 | |
|
Member
Join Date: Jul 2010
Location: Land of Mu
Posts: 350
|
Quote:
But I think I got it: RCP = reciprocal. And if it takes three slots, starting from x, then it means the shaders have been indeed buffed, but not in a why I'd like them too. Still, could be pretty efficient, if there's a 4D-vliw structure. |
|
|
|
|
|
|
#3409 | |
|
Registered
Join Date: Feb 2007
Posts: 64
|
Quote:
|
|
|
|
|
|
|
#3410 | |
|
Member
Join Date: Jan 2010
Location: Hamburg, Germany
Posts: 987
|
Quote:
And as expected, double precision is 1/4 of SP, at least for FMA and MUL, for ADD it's 1/2. |
|
|
|
|
|
|
#3411 | |
|
Senior Member
Join Date: Oct 2002
Posts: 2,436
|
Quote:
If even RCP is 3 taking 3 slots are the other transcendentals taking all 4 (given that RCP should be the cheapest one)? |
|
|
|
|
|
|
#3412 | |
|
KEPLER
Join Date: Jun 2005
Posts: 1,892
|
Quote:
__________________
People like you - Silent_Buddha laying an epic smackdown on XMAN26's double standards. So you're mixing apples and oranges to calculate grapes and then compare it to apples. - silent_guy's witty retort on sweeping comparisons. |
|
|
|
|
|
|
#3413 | ||
|
Member
Join Date: Jul 2010
Location: Land of Mu
Posts: 350
|
Quote:
Quote:
|
||
|
|
|
|
|
#3414 | ||
|
Regular
|
Quote:
Here's something old that I've tweaked for extra transcendentals: Code:
il_ps_2_0 dcldef_x(*)_y(*)_z(*)_w(*) r0 def c0, 0.6931471825, 1.442695022, 0.0, 0.0 dclpin_usage(color)_usageIndex(10)_x(*)_y(*)_z(*)_w(*)_centroid vPixIn0 dclpin_usage(color)_usageIndex(26)_x(*)_y(*)_z(*)_w(*)_centroid vPixIn1 mov r0, vPixIn0 add r1, r0, vPixIn1 log_zeroop(fltmax) r2.x___, r1.x_abs log_zeroop(fltmax) r2._y__, r1.y_abs log_zeroop(fltmax) r2.__z_, r1.z_abs log_zeroop(fltmax) r2.___w, r1.w_abs mul r1, r2, c0.x mul r0, r0, vPixIn1 mul r0, r0, c0.y exp r0.x___, r0.x rcp_zeroop(infinity) r2.x___, r0.x exp r0.x___, r0.y rcp_zeroop(infinity) r2._y__, r0.x exp r0.x___, r0.z exp r0._y__, r0.w rcp_zeroop(infinity) r2.___w, r0.y rcp_zeroop(infinity) r2.__z_, r0.x mul r0, r1, r2 dp4 r1.x___, r0, r0 rsq_zeroop(infinity) r1.x___, r1.x_abs mul r37, r0, r1.x colorclamp oC0, r37 end Quote:
__________________
Can it play WoW? |
||
|
|
|
|
|
#3415 |
|
Member
Join Date: Jan 2010
Location: Hamburg, Germany
Posts: 987
|
|
|
|
|
|
|
#3416 |
|
Senior Member
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,089
|
What is the throughput of RCP when issued through the current T lane?
I'm trying to think of what sequence of operations can be done across three lanes. A successive approximation of an RCP would take 5 iterations to yield a single-precision result if using plain math ops. Perhaps a more elaborate method is used?
__________________
Dreaming of a .065 micron etch-a-sketch. |
|
|
|
|
|
#3417 | ||
|
Regular
|
Quote:
Quote:
http://forum.beyond3d.com/showthread...26#post1417026 Back then the question became about throughput. Also, it's worth bearing in mind that for graphics the error can be quite large (it's single precision for a start). I didn't put any trig in the shader earlier, because that gets swamped by operand-normalisation (or at least it does some of the time - not sure, something I haven't studied).
__________________
Can it play WoW? |
||
|
|
|
|
|
#3418 | |
|
Member
Join Date: Jan 2010
Location: Hamburg, Germany
Posts: 987
|
Quote:
Code:
il_ps_2_0 dcl_literal l1, 0.6931471825, 1.442695022, 0.0, 0.0 dcl_literal l0, 0x10000, 0xffffffff, 2.1, 0.01 mov r13.x, l0.x ; loop counter mov r1.xyzw,l0.zzzz ; arbitrary values, r1 = 2*2-2 = 2 mov r2.xyzw,l0.wwww ; r2 = 0*0+0 = 0 mov r0, l0 add r1, r0, l0.wzyx whileloop break_logicalz r13 ; while(r13 > 0) log_zeroop(fltmax) r2.x___, r1.x_abs log_zeroop(fltmax) r2._y__, r1.y_abs log_zeroop(fltmax) r2.__z_, r1.z_abs log_zeroop(fltmax) r2.___w, r1.w_abs mul r1, r2, l1.x mul r0, r0, l0.wzyx mul r0, r0, l1.y exp r0.x___, r0.x rcp_zeroop(infinity) r2.x___, r0.x exp r0.x___, r0.y rcp_zeroop(infinity) r2._y__, r0.x exp r0.x___, r0.z exp r0._y__, r0.w rcp_zeroop(infinity) r2.___w, r0.y rcp_zeroop(infinity) r2.__z_, r0.x mul r0, r1, r2 dp4 r1.x___, r0, r0 rsq_zeroop(infinity) r1.x___, r1.x_abs mul r1, r0, r1.x iadd r13.x, r13.x, l0.y ; counter-- endloop mov g[0], r1 end On Cypress the ISA looks like that: Code:
; -------- Disassembly --------------------
00 ALU: ADDR(32) CNT(13)
0 x: MOV R0.x, (0x00010000, 9.183549616e-41f).x
y: MOV R0.y, (0xFFFFFFFF, -1.#QNANf).y
z: MOV R0.z, (0x40066666, 2.099999905f).z
w: MOV R0.w, (0x3C23D70A, 0.009999999776f).w
t: MOV R1.x, (0x3C23D70A, 0.009999999776f).w
1 x: MOV R2.x, (0x00010000, 9.183549616e-41f).x
y: MOV R1.y, (0xFFFFFFFF, -1.#QNANf).y
z: MOV R1.z, (0xFFFFFFFF, -1.#QNANf).y
w: MOV R1.w, (0x3C23D70A, 0.009999999776f).z
01 LOOP_DX10 i0 FAIL_JUMP_ADDR(5) VALID_PIX
02 ALU_BREAK: ADDR(45) CNT(1)
2 x: PREDNE_INT ____, R2.x, 0.0f UPDATE_EXEC_MASK UPDATE_PRED
03 ALU: ADDR(46) CNT(44)
3 x: MUL ____, R0.z, (0xFFFFFFFF, -1.#QNANf).x
y: MUL ____, R0.y, (0x40066666, 2.099999905f).y
z: MUL ____, R0.x, (0x3C23D70A, 0.009999999776f).z
w: MUL ____, R0.w, (0x00010000, 9.183549616e-41f).w
t: LOG_sat T0.z, |R1.x|
4 x: MUL T0.x, PV3.x, (0x3FB8AA3B, 1.442695022f).x
y: MUL T0.y, PV3.y, (0x3FB8AA3B, 1.442695022f).x
z: MUL T1.z, PV3.z, (0x3FB8AA3B, 1.442695022f).x
w: MUL T0.w, PV3.w, (0x3FB8AA3B, 1.442695022f).x
t: LOG_sat ____, |R1.y|
5 x: ADD_INT R2.x, -1, R2.x
y: MUL T1.y, PS4, (0x3F317218, 0.6931471825f).x
z: MUL T0.z, T0.z, (0x3F317218, 0.6931471825f).x
t: LOG_sat ____, |R1.z|
6 x: MUL T2.x, PS5, (0x3F317218, 0.6931471825f).x
t: LOG_sat ____, |R1.w|
7 w: MUL T1.w, PS6, (0x3F317218, 0.6931471825f).x
t: EXP_e T1.z, T1.z
8 t: EXP_e T1.x, T0.y
9 t: EXP_e T2.z, T0.x
10 t: EXP_e T0.y, T0.w
11 t: RCP_e ____, T1.z
12 x: MUL R0.x, T0.z, PS11
t: RCP_e ____, T1.x
13 y: MUL R0.y, T1.y, PS12
t: RCP_e T1.x, T0.y
14 t: RCP_e ____, T2.z
15 z: MUL R0.z, T2.x, PS14
w: MUL R0.w, T1.w, T1.x
16 x: DOT4 ____, R0.x, R0.x
y: DOT4 ____, R0.y, R0.y
z: DOT4 ____, PV15.z, PV15.z
w: DOT4 ____, PV15.w, PV15.w
17 t: RSQ_e ____, |PV16.x|
18 x: MUL R1.x, R0.x, PS17
y: MUL R1.y, R0.y, PS17
z: MUL R1.z, R0.z, PS17
w: MUL R1.w, R0.w, PS17
04 ENDLOOP i0 PASS_JUMP_ADDR(2)
05 MEM_EXPORT_WRITE: DWORD_PTR[0], R1, ELEM_SIZE(3)
06 ALU: ADDR(90) CNT(4)
19 x: MOV R0.x, 0.0f
y: MOV R0.y, 0.0f
z: MOV R0.z, 0.0f
w: MOV R0.w, 0.0f
07 EXP_DONE: PIX0, R0
END_OF_PROGRAM
Code:
; -------- Disassembly --------------------
00 ALU: ADDR(32) CNT(13)
0 x: MOV R0.x, (0x00010000, 9.183549616e-41f).x
y: MOV R0.y, (0xFFFFFFFF, -1.#QNANf).y
z: MOV R0.z, (0x40066666, 2.099999905f).z
w: MOV R0.w, (0x3C23D70A, 0.009999999776f).w
1 x: MOV R1.x, (0x3C23D70A, 0.009999999776f).x
y: MOV R1.y, (0xFFFFFFFF, -1.#QNANf).y
z: MOV R1.z, (0xFFFFFFFF, -1.#QNANf).y
w: MOV R1.w, (0x3C23D70A, 0.009999999776f).x
2 x: MOV R2.x, (0x00010000, 9.183549616e-41f).x
01 LOOP_DX10 i0 FAIL_JUMP_ADDR(5) VALID_PIX
02 ALU: ADDR(45) CNT(1)
3 x: PREDNE_INT ____, R2.x, 0.0f UPDATE_EXEC_MASK BREAK UPDATE_PRED
03 ALU: ADDR(46) CNT(71)
4 x: MUL ____, R0.z, (0xFFFFFFFF, -1.#QNANf).x
y: MUL ____, R0.y, (0x40066666, 2.099999905f).y
z: MUL ____, R0.x, (0x3C23D70A, 0.009999999776f).z
w: MUL ____, R0.w, (0x00010000, 9.183549616e-41f).w
5 x: MUL T0.x, PV4.x, (0x3FB8AA3B, 1.442695022f).x
y: MUL T0.y, PV4.w, (0x3FB8AA3B, 1.442695022f).x
z: MUL T0.z, PV4.z, (0x3FB8AA3B, 1.442695022f).x
w: MUL T0.w, PV4.y, (0x3FB8AA3B, 1.442695022f).x
6 x: LOG_sat ____, |R1.x|
y: LOG_sat ____, |R1.x|
z: LOG_sat ____, |R1.x|
7 x: LOG_sat ____, |R1.y|
y: LOG_sat ____, |R1.y|
z: LOG_sat ____, |R1.y|
w: MUL T1.w, PV6.x, (0x3F317218, 0.6931471825f).x
8 x: LOG_sat ____, |R1.z|
y: LOG_sat ____, |R1.z|
z: LOG_sat ____, |R1.z|
w: MUL T2.w, PV7.z, (0x3F317218, 0.6931471825f).x
9 x: LOG_sat ____, |R1.w|
y: LOG_sat ____, |R1.w|
z: LOG_sat ____, |R1.w|
w: MUL T3.w, PV8.y, (0x3F317218, 0.6931471825f).x
10 x: EXP_e ____, T0.z
y: EXP_e T1.y, T0.z
z: EXP_e ____, T0.z
w: MUL R0.w, PV9.x, (0x3F317218, 0.6931471825f).x
11 x: EXP_e ____, T0.w
y: EXP_e ____, T0.w
z: EXP_e T0.z, T0.w
12 x: EXP_e T0.x, T0.x
y: EXP_e ____, T0.x
z: EXP_e ____, T0.x
13 x: EXP_e ____, T0.y
y: EXP_e ____, T0.y
z: EXP_e T1.z, T0.y
14 x: RCP_e T1.x, T1.y
y: RCP_e ____, T1.y
z: RCP_e ____, T1.y
15 x: RCP_e ____, T0.z
y: RCP_e T1.y, T0.z
z: RCP_e ____, T0.z
16 x: RCP_e ____, T1.z
y: RCP_e T0.y, T1.z
z: RCP_e ____, T1.z
17 x: RCP_e ____, T0.x
y: RCP_e ____, T0.x
z: RCP_e ____, T0.x
18 x: MUL R0.x, T1.w, T1.x
y: MUL R0.y, T2.w, T1.y VEC_120
z: MUL R0.z, T3.w, PV17.x VEC_201
19 x: ADD_INT R2.x, -1, R2.x
w: MUL R0.w, R0.w, T0.y
20 x: DOT4 ____, R0.x, R0.x
y: DOT4 ____, R0.y, R0.y
z: DOT4 ____, R0.z, R0.z
w: DOT4 ____, PV19.w, PV19.w
21 x: RSQ_e ____, |PV20.x|
y: RSQ_e ____, |PV20.x|
z: RSQ_e ____, |PV20.x|
22 x: MUL R1.x, R0.x, PV21.y
y: MUL R1.y, R0.y, PV21.y
z: MUL R1.z, R0.z, PV21.y
w: MUL R1.w, R0.w, PV21.y
04 ENDLOOP i0 PASS_JUMP_ADDR(2)
05 MEM_EXPORT_WRITE: DWORD_PTR[0], R1, ELEM_SIZE(3) VPM
06 ALU: ADDR(117) CNT(4)
23 x: MOV R0.x, 0.0f
y: MOV R0.y, 0.0f
z: MOV R0.z, 0.0f
w: MOV R0.w, 0.0f
07 EXP_DONE: PIX0, R0
08 END
END_OF_PROGRAM
Last edited by Gipsel; 19-Oct-2010 at 01:20. |
|
|
|
|
|
|
#3419 | |||
|
Member
Join Date: Dec 2009
Posts: 582
|
Quote:
Quote:
Quote:
|
|||
|
|
|
|
|
#3420 |
|
Senior Member
Join Date: Oct 2002
Posts: 2,436
|
|
|
|
|
|
|
#3421 | |
|
Member
Join Date: Jan 2010
Location: Hamburg, Germany
Posts: 987
|
Quote:
Code:
4 x: MULLO_INT R3.x, R2.x, R2.x
y: MULLO_INT ____, R2.x, R2.x
z: MULLO_INT ____, R2.x, R2.x
w: MULLO_INT ____, R2.x, R2.x
5 x: MULHI_INT R4.x, R2.x, R2.x
y: MULHI_INT ____, R2.x, R2.x
z: MULHI_INT ____, R2.x, R2.x
w: MULHI_INT ____, R2.x, R2.x
|
|
|
|
|
|
|
#3422 | |
|
Member
Join Date: Jan 2010
Location: Hamburg, Germany
Posts: 987
|
Quote:
Let me check. Edit: Integer to float and float to integer conversions are handled by all 4 slots. Not together, but 1 slot does one conversion. So the throughput is 4 conversions per cycle max. That's qute a bit faster than Evergreen (conversions only in t unit). Code:
4 x: F_TO_I R2.x, R1.x
y: F_TO_I R2.y, R1.y
z: F_TO_I R2.z, R1.z
w: F_TO_I R2.w, R1.w
Code:
4 x: RNDNE R2.x, R1.x
y: RNDNE R2.y, R1.y
z: RNDNE R2.z, R1.z
w: RNDNE R2.w, R1.w
Code:
4 x: FLT32_TO_FLT16_RTZ__NI R2.x, R1.x
y: FLT32_TO_FLT16_RTZ__NI R2.y, R1.y
z: FLT32_TO_FLT16_RTZ__NI R2.z, R1.z
w: FLT32_TO_FLT16_RTZ__NI R2.w, R1.w
Last edited by Gipsel; 19-Oct-2010 at 02:59. |
|
|
|
|
|
|
#3423 |
|
Senior Member
|
Charlie was right about the 4 symmetric alu rumour.
|
|
|
|
|
|
#3424 | |
|
Senior Member
|
Quote:
|
|
|
|
|
|
|
#3425 | |
|
Senior Member
Join Date: Oct 2002
Posts: 2,436
|
Quote:
|
|
|
|
|
![]() |
| Tags |
| Барт, Кайман |
| Thread Tools | |
| Display Modes | |
|
|