Is this RSX "math" correct?

Cheezdoodles

+ 1
Veteran
Code:
NVIDIA RSX
550MHz Core
136 Shader Operations per Cycle

24 Pixel Pipelines (2 Vector + 2 Scalar + 1 Texture ALUs)
8 Vertex Pipelines (1 Vector + 1 Scalar ALUs)

(24 x 5) + (8 x 2) = 136
550MHz x 96 = 52.8 Billion Pixel Shader Ops/Sec
550MHz x 24 = 13.2 Billion Texture Address Ops/Sec
550MHz x 16 = 8.8 Billion Vertex Shader Ops/Sec

?
 
Code:
NVIDIA RSX
550MHz Core
136 Shader Operations per Cycle

24 Pixel Pipelines (2 Vector + 2 Scalar + 1 Texture ALUs)
8 Vertex Pipelines (1 Vector + 1 Scalar ALUs)

(24 x 5) + (8 x 2) = 136
550MHz x 96 = 52.8 Billion Pixel Shader Ops/Sec
550MHz x 24 = 13.2 Billion Texture Address Ops/Sec
550MHz x 16 = 8.8 Billion Vertex Shader Ops/Sec

?

nope, for what I know in every pipeline one of the two shaders is bounded with the texture alu, so in every cycle of clock you have to chose one or another, not both at same time.
 
nope, for what I know in every pipeline one of the two shaders is bounded with the texture alu, so in every cycle of clock you have to chose one or another, not both at same time.

I believe one takes texture duty on demand..it's not forced. So if you have no texture ops to take care of, the two can be used for arithmitic. Second, when an ALU is being used for texture ops, apparently it's more like half the ALU is unavailable..apparently some ops can still run on the 'texture ALU' when it's being used as such.
 
The correct PS setup for the official 136 number is: 24 Pixel Pipelines (2 FP32 Vector + 2 FP32 Scalar + 1 FP16 NRM).
And you shouldn't hope to schedule two ops at the same time as the texture op in the first superscalar FP32 ALU. So that's why you can safely conclude it's not the way they're counting their meaningless "ops". Still, I'll admit not to know exactly how that works (tex+1 op), if at all. :)

Uttar
 
If you're going to be on this forum, you may as well read the threads that are posted here! ;) It's unofficially confirmed by reliable sources. Not absolutely certain, but it'll be more a surprise if the clock speed isn't 500 MHz than if it is.
 
ok..... its that i usualy dont talk things up like its a fact when its just a rumour, for the bad or for the good


thanks
 
I believe one takes texture duty on demand..it's not forced. So if you have no texture ops to take care of, the two can be used for arithmitic.

that makes the math wrong, he is thinking that it can use all the shaders AND doing texture operations, he have to change the logic AND with logic OR and redo the math

I can play soccer, I can sleep 10 hour, but not both at same time.
same it's for rsx, in the whole time there's few text ops and a lot of shading, but PER-CLOCK you, OR use fully the two shaders in a pipeline, OR do some texturing.

500MHz Core
PER-CLOCK when not texturing
24 Pixel Pipelines (2 Vector + 2 Scalar )
8 Vertex Pipelines (1 Vector + 1 Scalar ALUs)

Pixel ops per clock = 24x4 = 96 Shads/Clk
Vertex ps per clock = 8x2 = 16 Shads/Clk

(this have to be counted separatly because of the not-USA structure)

500MHz x 96 = 48 Billion Pixel Shader Ops/Sec
500MHz x 16 = 8.0 Billion Vertex Shader Ops/Sec

PER-CLOCK when texturing
24 Pixel Pipelines (1 Vector + 1 Scalar + 1 Texture )
8 Vertex Pipelines (1 Vector + 1 Scalar ALUs)

Pixel ops per clock = 24x2 = 48 Shads/Clk
Text ops per clock = 24 TexOp/Clk
Vertex ps per clock = 8x2 = 16 Shads/Clk

500MHz x 48 = 24 Billion Pixel Shader Ops/Sec
500MHz x 16 = 8.0 Billion Vertex Shader Ops/Sec
500MHz x 24 = 12.0 Billion Texture Address Ops/Sec


so the real situation imho will stay between:
48 Billion Pixel Shader Ops/Sec
and
24 Billion Pixel Shader Ops/Sec + 12.0 Billion Texture Address Ops/Sec

always with:
8.0 Billion Vertex Shader Ops/Sec


I don't know if this will end to be exact at all, but in my opinin this is more fair and precise the original math
 
Last edited by a moderator:
I can play soccer, I can sleep 10 hour, but not both at same time.
Maybe someone can borrow your soccer shoes while you are sleeping?

I remember some nVidia patent which concerned alus being shared between fix function hardware and shaders. I don't know if it could mean anything in this context.

EDIT: An alu usually consists of several units: add/sub, multiplier....
 
Last edited by a moderator:
PER-CLOCK when not texturing
24 Pixel Pipelines (2 Vector + 2 Scalar )
8 Vertex Pipelines (1 Vector + 1 Scalar ALUs)

Pixel ops per clock = 24x4 = 96 Shads/Clk
Vertex ps per clock = 8x2 = 16 Shads/Clk
Read Uttar's post again. You forgot the normalization. If you add it in as a fifth per-pixel operation, you have five ops per clock per pixel pipe and that actually allows you to reach the official number of 5*24+16=136 ops per clock .
 
If RSX = 24 pixel shaders pipe (24 * 27 flops / 5 shader ops X 24 = 648 flops total - include norm) + 2 vertex shaders ( 2 shader ops *2 / 2 *10 flops something like 250/275millions verts /sec = 2 * 500MHz/550MHz/ 4 cycles)?


( I dont see any citation on slide GDC06 about vertex shaders... only pixel shaders = 384flops...)
 
Last edited by a moderator:
that makes the math wrong, he is thinking that it can use all the shaders AND doing texture operations

His figures don't say that, they're presented as seperate maxima. Doing so doesn't make explicit the balancing between arithmitic and texture ops that occurs, but it doesn't preclude it either.
 
Back
Top